commit 03140572475f22e638d88aaa9410007317f376b4 Author: Greg Kroah-Hartman Date: Sat May 31 13:45:55 2014 -0700 Linux 3.14.5 commit fc2ab6e8f4903f3e2da6ea0a807403fc91f8debb Author: Eric Dumazet Date: Thu Apr 3 09:28:10 2014 -0700 net-gro: reset skb->truesize in napi_reuse_skb() [ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 242339a25ed829432d99e6571d170b308432ed55 Author: Li RongQing Date: Thu May 22 16:36:55 2014 +0800 ipv4: initialise the itag variable in __mkroute_input [ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ] the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ff9cfa6b917a903fe24dba928dca61d0c9339864 Author: Eric Dumazet Date: Mon May 19 21:56:34 2014 -0700 ipv6: gro: fix CHECKSUM_COMPLETE support [ Upstream commit 4de462ab63e23953fd05da511aeb460ae10cc726 ] When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling broke on GRE+IPv6 because we did not update/use the appropriate csum : GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of skb->csum Tested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens at the first level (ethernet device) instead of being done in gre tunnel. Native IPv6+TCP is still properly aggregated. Fixes: bf5a755f5e918 ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: Eric Dumazet Cc: Jerry Chu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1ff9c00a7a14af5c848d00eb264cb4e6c9cb30a6 Author: Cong Wang Date: Mon May 19 12:15:49 2014 -0700 net_sched: fix an oops in tcindex filter [ Upstream commit bf63ac73b3e132e6bf0c8798aba7b277c3316e19 ] Kelly reported the following crash: IP: [] tcf_action_exec+0x46/0x90 PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000 RIP: 0010:[] [] tcf_action_exec+0x46/0x90 RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283 RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8 RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840 RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001 R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840 R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8 FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0 Stack: ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000 ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90 ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18 Call Trace: [] tcindex_classify+0x88/0x9b [] tc_classify_compat+0x3e/0x7b [] tc_classify+0x25/0x9f [] htb_enqueue+0x55/0x27a [] dsmark_enqueue+0x165/0x1a4 [] __dev_queue_xmit+0x35e/0x536 [] dev_queue_xmit+0x10/0x12 [] packet_sendmsg+0xb26/0xb9a [] ? __lock_acquire+0x3ae/0xdf3 [] __sock_sendmsg_nosec+0x25/0x27 [] sock_aio_write+0xd0/0xe7 [] do_sync_write+0x59/0x78 [] vfs_write+0xb5/0x10a [] SyS_write+0x49/0x7f [] system_call_fastpath+0x16/0x1b This is because we memcpy struct tcindex_filter_result which contains struct tcf_exts, obviously struct list_head can not be simply copied. This is a regression introduced by commit 33be627159913b094bb578 (net_sched: act: use standard struct list_head). It's not very easy to fix it as the code is a mess: if (old_r) memcpy(&cr, r, sizeof(cr)); else { memset(&cr, 0, sizeof(cr)); tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); } ... tcf_exts_change(tp, &cr.exts, &e); ... memcpy(r, &cr, sizeof(cr)); the above code should equal to: tcindex_filter_result_init(&cr); if (old_r) cr.res = r->res; ... if (old_r) tcf_exts_change(tp, &r->exts, &e); else tcf_exts_change(tp, &cr.exts, &e); ... r->res = cr.res; after this change, since there is no need to copy struct tcf_exts. And it also fixes other places zero'ing struct's contains struct tcf_exts. Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head) Reported-by: Kelly Anderson Tested-by: Kelly Anderson Cc: David S. Miller Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 49da0655e28b5bf9d869e1c50c15c990c5db5f94 Author: Steffen Klassert Date: Mon May 19 11:36:56 2014 +0200 ip_tunnel: Initialize the fallback device properly [ Upstream commit 78ff4be45a4c51d8fb21ad92e4fabb467c6c3eeb ] We need to initialize the fallback device to have a correct mtu set on this device. Otherwise the mtu is set to null and the device is unusable. Fixes: fd58156e456d ("IPIP: Use ip-tunneling code.") Cc: Pravin B Shelar Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 39f95dfd2c76415d03489aef9674638e00c95055 Author: Antonio Quartulli Date: Sat Mar 29 17:27:38 2014 +0100 batman-adv: fix local TT check for outgoing arp requests in DAT [ Upstream commit cc2f33860cea0e48ebec096130bd0f7c4bf6e0bc ] Change introduced by 88e48d7b3340ef07b108eb8a8b3813dd093cc7f7 ("batman-adv: make DAT drop ARP requests targeting local clients") implements a check that prevents DAT from using the caching mechanism when the client that is supposed to provide a reply to an arp request is local. However change brought by be1db4f6615b5e6156c807ea8985171c215c2d57 ("batman-adv: make the Distributed ARP Table vlan aware") has not converted the above check into its vlan aware version thus making it useless when the local client is behind a vlan. Fix the behaviour by properly specifying the vlan when checking for a client being local or not. Reported-by: Simon Wunderlich Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner Signed-off-by: Greg Kroah-Hartman commit 32eb568e09bfdf10bd90c8478e5a79b5f9f3c841 Author: Antonio Quartulli Date: Fri May 2 01:35:13 2014 +0200 batman-adv: increase orig refcount when storing ref in gw_node [ Upstream commit 377fe0f968b30a1a714fab53a908061914f30e26 ] A pointer to the orig_node representing a bat-gateway is stored in the gw_node->orig_node member, but the refcount for such orig_node is never increased. This leads to memory faults when gw_node->orig_node is accessed and the originator has already been freed. Fix this by increasing the refcount on gw_node creation and decreasing it on gw_node free. Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner Signed-off-by: Greg Kroah-Hartman commit e450a9b46e0d1c6d0b94a63d0d5fd4273d45de24 Author: Antonio Quartulli Date: Wed Apr 23 14:05:16 2014 +0200 batman-adv: fix reference counting imbalance while sending fragment [ Upstream commit be181015a189cd141398b761ba4e79d33fe69949 ] In the new fragmentation code the batadv_frag_send_packet() function obtains a reference to the primary_if, but it does not release it upon return. This reference imbalance prevents the primary_if (and then the related netdevice) to be properly released on shut down. Fix this by releasing the primary_if in batadv_frag_send_packet(). Introduced by ee75ed88879af88558818a5c6609d85f60ff0df4 ("batman-adv: Fragment and send skbs larger than mtu") Cc: Martin Hundebøll Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner Acked-by: Martin Hundebøll Signed-off-by: Greg Kroah-Hartman commit b6bc7a13cab3966f03d3efd79892c8d2f085b919 Author: Marek Lindner Date: Thu Apr 24 03:44:25 2014 +0800 batman-adv: fix indirect hard_iface NULL dereference [ Upstream commit 16a4142363b11952d3aa76ac78004502c0c2fe6e ] If hard_iface is NULL and goto out is made batadv_hardif_free_ref() doesn't check for NULL before dereferencing it to get to refcount. Introduced in cb1c92ec37fb70543d133a1fa7d9b54d6f8a1ecd ("batman-adv: add debugfs support to view multiif tables"). Reported-by: Sven Eckelmann Signed-off-by: Marek Lindner Acked-by: Antonio Quartulli Signed-off-by: Antonio Quartulli Signed-off-by: Greg Kroah-Hartman commit d83924d7681e7048d639debb614bdd1d9183d201 Author: Eric Dumazet Date: Fri May 16 11:34:37 2014 -0700 net: gro: make sure skb->cb[] initial content has not to be zero [ Upstream commit 29e98242783ed3ba569797846a606ba66f781625 ] Starting from linux-3.13, GRO attempts to build full size skbs. Problem is the commit assumed one particular field in skb->cb[] was clean, but it is not the case on some stacked devices. Timo reported a crash in case traffic is decrypted before reaching a GRE device. Fix this by initializing NAPI_GRO_CB(skb)->last at the right place, this also removes one conditional. Thanks a lot to Timo for providing full reports and bisecting this. Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb") Bisected-by: Timo Teras Signed-off-by: Eric Dumazet Tested-by: Timo Teräs Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2132c5ea87fe6061762813b4f84b3846347de09c Author: Nikolay Aleksandrov Date: Thu May 15 13:35:23 2014 +0200 bonding: fix out of range parameters for bond_intmax_tbl [ Upstream commit 81c708068dfedece038e07d818ba68333d8d885d ] I've missed to add a NULL entry to the bond_intmax_tbl when I introduced it with the conversion of arp_interval so add it now. CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek Fixes: 7bdb04ed0dbf ("bonding: convert arp_interval to use the new option API") Signed-off-by: Nikolay Aleksandrov Acked-by: Veaceslav Falico Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cd7faf359dc0eeb672c820d2308a71b1c1fbadf9 Author: Guenter Roeck Date: Wed May 14 13:12:49 2014 -0700 net: phy: Don't call phy_resume if phy_init_hw failed [ Upstream commit b394745df2d9d4c30bf1bcc55773bec6f3bc7c67 ] After the call to phy_init_hw failed in phy_attach_direct, phy_detach is called to detach the phy device from its network device. If the attached driver is a generic phy driver, this also detaches the driver. Subsequently phy_resume is called, which assumes without checking that a driver is attached to the device. This will result in a crash such as Unable to handle kernel paging request for data at address 0xffffffffffffff90 Faulting instruction address: 0xc0000000003a0e18 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0000000003a0e18] .phy_attach_direct+0x68/0x17c LR [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c Call Trace: [c0000003fc0475d0] [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c (unreliable) [c0000003fc047670] [c0000000003a0ff8] .phy_connect_direct+0x28/0x98 [c0000003fc047700] [c0000000003f0074] .of_phy_connect+0x4c/0xa4 Only call phy_resume if phy_init_hw was successful. Signed-off-by: Guenter Roeck Acked-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit af8c0e0612f5288b895d427f275d8ca436efab24 Author: Cong Wang Date: Mon May 12 15:11:20 2014 -0700 rtnetlink: wait for unregistering devices in rtnl_link_unregister() [ Upstream commit 200b916f3575bdf11609cb447661b8d5957b0bbf ] From: Cong Wang commit 50624c934db18ab90 (net: Delay default_device_exit_batch until no devices are unregistering) introduced rtnl_lock_unregistering() for default_device_exit_batch(). Same race could happen we when rmmod a driver which calls rtnl_link_unregister() as we call dev->destructor without rtnl lock. For long term, I think we should clean up the mess of netdev_run_todo() and net namespce exit code. Cc: Eric W. Biederman Cc: David S. Miller Signed-off-by: Cong Wang Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 65117efb2b65ef69c1c243b63296142164d152ee Author: Hannes Frederic Sowa Date: Sun May 11 23:01:13 2014 +0200 ipv6: fix calculation of option len in ip6_append_data [ Upstream commit 3a1cebe7e05027a1c96f2fc1a8eddf5f19b78f42 ] tot_len does specify the size of struct ipv6_txoptions. We need opt_flen + opt_nflen to calculate the overall length of additional ipv6 extensions. I found this while auditing the ipv6 output path for a memory corruption reported by Alexey Preobrazhensky while he fuzzed an instrumented AddressSanitizer kernel with trinity. This may or may not be the cause of the original bug. Fixes: 4df98e76cde7c6 ("ipv6: pmtudisc setting not respected with UFO/CORK") Reported-by: Alexey Preobrazhensky Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit adeb3fe4ef6621793d7f1d6f0b9c9cc88827c5b7 Author: Hannes Frederic Sowa Date: Sun May 11 22:59:30 2014 +0200 net: avoid dependency of net_get_random_once on nop patching [ Upstream commit 3d4405226d27b3a215e4d03cfa51f536244e5de7 ] net_get_random_once depends on the static keys infrastructure to patch up the branch to the slow path during boot. This was realized by abusing the static keys api and defining a new initializer to not enable the call site while still indicating that the branch point should get patched up. This was needed to have the fast path considered likely by gcc. The static key initialization during boot up normally walks through all the registered keys and either patches in ideal nops or enables the jump site but omitted that step on x86 if ideal nops where already placed at static_key branch points. Thus net_get_random_once branches not always became active. This patch switches net_get_random_once to the ordinary static_key api and thus places the kernel fast path in the - by gcc considered - unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that the unlikely path actually beats the likely path in terms of cycle cost and that different nop patterns did not make much difference, thus this switch should not be noticeable. Fixes: a48e42920ff38b ("net: introduce new macro net_get_random_once") Reported-by: Tuomas Räsänen Cc: Linus Torvalds Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4d3b152a2e4d79a6151c396e173e599e54ac6dfe Author: Heiko Carstens Date: Wed May 14 09:48:21 2014 +0200 net: filter: s390: fix JIT address randomization [ Upstream commit e84d2f8d2ae33c8215429824e1ecf24cbca9645e ] This is the s390 variant of Alexei's JIT bug fix. (patch description below stolen from Alexei's patch) bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [] change_page_attr_set_clr+0x135/0x460 [] ? _raw_spin_unlock_irq+0x30/0x50 [] set_memory_rw+0x2f/0x40 [] bpf_jit_free_deferred+0x2d/0x60 [] process_one_work+0x1d8/0x6a0 [] ? process_one_work+0x178/0x6a0 [] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header *header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page. Fixes: aa2d2c73c21f2 ("s390/bpf,jit: address randomize and write protect jit code") Reported-by: Alexei Starovoitov Cc: # v3.11+ Signed-off-by: Heiko Carstens Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 070a08cd74daf53cd97f52f17636cc723c5cb370 Author: Alexei Starovoitov Date: Tue May 13 15:05:55 2014 -0700 net: filter: x86: fix JIT address randomization [ Upstream commit 773cd38f40b8834be991dbfed36683acc1dd41ee ] bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [] change_page_attr_set_clr+0x135/0x460 [] ? _raw_spin_unlock_irq+0x30/0x50 [] set_memory_rw+0x2f/0x40 [] bpf_jit_free_deferred+0x2d/0x60 [] process_one_work+0x1d8/0x6a0 [] ? process_one_work+0x178/0x6a0 [] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header *header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") Signed-off-by: Alexei Starovoitov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cd70e679a9c27a081264a2c40461563c486613b8 Author: Simon Wunderlich Date: Wed Mar 26 15:46:24 2014 +0100 batman-adv: fix removing neigh_ifinfo [ Upstream commit 709de13f0c532fe9c468c094aff069a725ed57fe ] When an interface is removed separately, all neighbors need to be checked if they have a neigh_ifinfo structure for that particular interface. If that is the case, remove that ifinfo so any references to a hard interface can be freed. This is a regression introduced by 89652331c00f43574515059ecbf262d26d885717 ("batman-adv: split tq information in neigh_node struct") Reported-by: Antonio Quartulli Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli Signed-off-by: Greg Kroah-Hartman commit c53de6e7451ac60f18754e81962651fdfafa4226 Author: Simon Wunderlich Date: Wed Mar 26 15:46:23 2014 +0100 batman-adv: always run purge_orig_neighbors [ Upstream commit 7b955a9fc164487d7c51acb9787f6d1b01b35ef6 ] The current code will not execute batadv_purge_orig_neighbors() when an orig_ifinfo has already been purged. However we need to run it in any case. Fix that. This is a regression introduced by 7351a4822d42827ba0110677c0cbad88a3d52585 ("batman-adv: split out router from orig_node") Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli Signed-off-by: Greg Kroah-Hartman commit ea859052dd85388cbfd20bf252d4db92123344b9 Author: Simon Wunderlich Date: Wed Mar 26 15:46:22 2014 +0100 batman-adv: fix neigh reference imbalance [ Upstream commit 000c8dff97311357535d64539e58990526e4de70 ] When an interface is removed from batman-adv, the orig_ifinfo of a orig_node may be removed without releasing the router first. This will prevent the reference for the neighbor pointed at by the orig_ifinfo->router to be released, and this leak may result in reference leaks for the interface used by this neighbor. Fix that. This is a regression introduced by 7351a4822d42827ba0110677c0cbad88a3d52585 ("batman-adv: split out router from orig_node"). Reported-by: Antonio Quartulli Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli Signed-off-by: Greg Kroah-Hartman commit 1adf643050df1289c36dd158baa5d4cc2fa593ae Author: Simon Wunderlich Date: Wed Mar 26 15:46:21 2014 +0100 batman-adv: fix neigh_ifinfo imbalance [ Upstream commit c1e517fbbcdb13f50662af4edc11c3251fe44f86 ] The neigh_ifinfo object must be freed if it has been used in batadv_iv_ogm_process_per_outif(). This is a regression introduced by 89652331c00f43574515059ecbf262d26d885717 ("batman-adv: split tq information in neigh_node struct") Reported-by: Antonio Quartulli Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli Signed-off-by: Greg Kroah-Hartman commit c712c1f79c8e074cb4568231c3b034519896108d Author: Duan Jiong Date: Fri May 9 13:16:48 2014 +0800 neigh: set nud_state to NUD_INCOMPLETE when probing router reachability [ Upstream commit 2176d5d41891753774f648b67470398a5acab584 ] Since commit 7e98056964("ipv6: router reachability probing"), a router falls into NUD_FAILED will be probed. Now if function rt6_select() selects a router which neighbour state is NUD_FAILED, and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE, then function dst_neigh_output() can directly send packets, but actually the neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead NUD_PROBE, packets will not be sent out until the neihbour is reachable. In addition, because the route should be probes with a single NS, so we must set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function neigh_timer_handler() will not send other NS Messages. Signed-off-by: Duan Jiong Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a3db451d65cb1b0195ebf19798811963b30173ea Author: Susant Sahani Date: Sat May 10 00:11:32 2014 +0530 ip6_tunnel: fix potential NULL pointer dereference [ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ] The function ip6_tnl_validate assumes that the rtnl attribute IFLA_IPTUN_PROTO always be filled . If this attribute is not filled by the userspace application kernel get crashed with NULL pointer dereference. This patch fixes the potential kernel crash when IFLA_IPTUN_PROTO is missing . Signed-off-by: Susant Sahani Acked-by: Thomas Graf Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e31895fa971b7550a35c482b55666be16bd1c307 Author: Nikolay Aleksandrov Date: Fri May 9 11:11:39 2014 +0200 sfc: fix calling of free_irq with already free vector [ Upstream commit 1c3639005f48492e5f2d965779efd814e80f8b15 ] If the sfc driver is in legacy interrupt mode (either explicitly by using interrupt_mode module param or by falling back to it) it will hit a warning at kernel/irq/manage.c because it will try to free an irq which wasn't allocated by it in the first place because the MSI(X) irqs are zero and it'll try to free them unconditionally. So fix it by checking if we're in legacy mode and freeing the appropriate irqs. CC: Zenghui Shi CC: Ben Hutchings CC: CC: Shradha Shah CC: David S. Miller Fixes: 1899c111a535 ("sfc: Fix IRQ cleanup in case of a probe failure") Reported-by: Zenghui Shi Signed-off-by: Nikolay Aleksandrov Acked-by: Shradha Shah Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 064651306507908bb68de656f10227311605bd08 Author: Peter Christensen Date: Thu May 8 11:15:37 2014 +0200 macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces. [ Upstream commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 ] Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Peter Christensen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d177ade5a57be857df1ddbdfc06627d31f11ef06 Author: Bjørn Mork Date: Fri May 9 14:45:00 2014 +0200 net: cdc_mbim: handle unaccelerated VLAN tagged frames [ Upstream commit 6b5eeb7f874b689403e52a646e485d0191ab9507 ] This driver maps 802.1q VLANs to MBIM sessions. The mapping is based on a bogus assumption that all tagged frames will use the acceleration API because we enable NETIF_F_HW_VLAN_CTAG_TX. This fails for e.g. frames tagged in userspace using packet sockets. Such frames will erroneously be considered as untagged and silently dropped based on not being IP. Fix by falling back to looking into the ethernet header for a tag if no accelerated tag was found. Fixes: a82c7ce5bc5b ("net: cdc_ncm: map MBIM IPS SessionID to VLAN ID") Cc: Greg Suarez Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 13c5c91b6b4a68dd9ac5b4ea9974c0d1402daa0f Author: Sergey Popovich Date: Tue May 6 18:23:08 2014 +0300 ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation [ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ] Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8eb6ded767268d02e88155b4ced544cebe2ba495 Author: Florian Westphal Date: Mon May 5 00:03:34 2014 +0200 net: ipv6: send pkttoobig immediately if orig frag size > mtu [ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ] If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1f68835d92bbfe3a14d369d3dce123c94521b105 Author: Florian Westphal Date: Sun May 4 23:24:31 2014 +0200 net: ipv4: ip_forward: fix inverted local_df test [ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ] local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b8313dbb84b1bd2383446d944d6d9d5a384a065f Author: Bjørn Mork Date: Sat May 3 16:12:47 2014 +0200 net: cdc_mbim: __vlan_find_dev_deep need rcu_read_lock [ Upstream commit 4f4178f3bb1f470d7fb863ec531e08e20a0fd51c ] Fixes this warning introduced by commit 5b8f15f78e6f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations"): =============================== [ INFO: suspicious RCU usage. ] 3.15.0-rc3 #213 Tainted: G W O ------------------------------- net/8021q/vlan_core.c:69 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 no locks held by ksoftirqd/0/3. stack backtrace: CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W O 3.15.0-rc3 #213 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 0000000000000001 ffff880232533bf0 ffffffff813a5ee6 0000000000000006 ffff880232530090 ffff880232533c20 ffffffff81076b94 0000000000000081 0000000000000000 ffff8802085ac000 ffff88007fc8ea00 ffff880232533c50 Call Trace: [] dump_stack+0x4e/0x68 [] lockdep_rcu_suspicious+0xfa/0x103 [] __vlan_find_dev_deep+0x54/0x94 [] cdc_mbim_rx_fixup+0x379/0x66a [cdc_mbim] [] ? _raw_spin_unlock_irqrestore+0x3a/0x49 [] ? trace_hardirqs_on_caller+0x192/0x1a1 [] usbnet_bh+0x59/0x287 [usbnet] [] tasklet_action+0xbb/0xcd [] __do_softirq+0x14c/0x30d [] run_ksoftirqd+0x1f/0x50 [] smpboot_thread_fn+0x172/0x18e [] ? SyS_setgroups+0xdf/0xdf [] kthread+0xb5/0xbd [] ? __wait_for_common+0x13b/0x170 [] ? __kthread_parkme+0x5c/0x5c [] ret_from_fork+0x7c/0xb0 [] ? __kthread_parkme+0x5c/0x5c Fixes: 5b8f15f78e6f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations") Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0bee206acd1bdbd2bb2e8116a78a12f7b7cb39ef Author: Timo Teräs Date: Fri May 16 08:34:39 2014 +0300 ipv4: ip_tunnels: disable cache for nbma gre tunnels [ Upstream commit 22fb22eaebf4d16987f3fd9c3484c436ee0badf2 ] The connected check fails to check for ip_gre nbma mode tunnels properly. ip_gre creates temporary tnl_params with daddr specified to pass-in the actual target on per-packet basis from neighbor layer. Detect these tunnels by inspecting the actual tunnel configuration. Minimal test case: ip route add 192.168.1.1/32 via 10.0.0.1 ip route add 192.168.1.2/32 via 10.0.0.2 ip tunnel add nbma0 mode gre key 1 tos c0 ip addr add 172.17.0.0/16 dev nbma0 ip link set nbma0 up ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0 ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0 ping 172.17.0.1 ping 172.17.0.2 The second ping should be going to 192.168.1.2 and head 10.0.0.2; but cached gre tunnel level route is used and it's actually going to 192.168.1.1 via 10.0.0.1. The lladdr's need to go to separate dst for the bug to trigger. Test case uses separate route entries, but this can also happen when the route entry is same: if there is a nexthop exception or the GRE tunnel is IPsec'ed in which case the dst points to xfrm bundle unique to the gre lladdr. Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels") Signed-off-by: Timo Teräs Cc: Tom Herbert Cc: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 22b2efbb5aa8d97b5f2b7a030dea626d97c87a09 Author: Ying Cai Date: Sun May 4 15:20:04 2014 -0700 ip_tunnel: Set network header properly for IP_ECN_decapsulate() [ Upstream commit e96f2e7c430014eff52c93cabef1ad4f42ed0db1 ] In ip_tunnel_rcv(), set skb->network_header to inner IP header before IP_ECN_decapsulate(). Without the fix, IP_ECN_decapsulate() takes outer IP header as inner IP header, possibly causing error messages or packet drops. Note that this skb_reset_network_header() call was in this spot when the original feature for checking consistency of ECN bits through tunnels was added in eccc1bb8d4b4 ("tunnel: drop packet if ECN present with not-ECT"). It was only removed from this spot in 3d7b46cd20e3 ("ip_tunnel: push generic protocol handling to ip_tunnel module."). Fixes: 3d7b46cd20e3 ("ip_tunnel: push generic protocol handling to ip_tunnel module.") Reported-by: Neal Cardwell Signed-off-by: Ying Cai Acked-by: Neal Cardwell Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 935ec25dfa56f012af48155e9c2a35a7a10fdaea Author: Eyal Perry Date: Sun May 4 17:07:25 2014 +0300 net/mlx4_core: Don't issue PCIe speed/width checks for VFs [ Upstream commit 83d3459a5928f18c9344683e31bc2a7c3c25562a ] Carrying out PCI speed/width checks through pcie_get_minimum_link() on VFs yield wrong results, so remove them. Fixes: b912b2f ('net/mlx4_core: Warn if device doesn't have enough PCI bandwidth') Signed-off-by: Eyal Perry Signed-off-by: Or Gerlitz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 19ca28c306fcf825c165f6f3b9532bec479a60b4 Author: Bjørn Mork Date: Fri May 2 23:27:00 2014 +0200 net: cdc_ncm: fix buffer overflow [ Upstream commit 9becd707841207652449a8dfd90fe9c476d88546 ] Commit 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs") changed the padding logic for devices with the ZLP flag set. This meant that frames of any size will be sent without additional padding, except for the single byte added if the size is a multiple of the USB packet size. But if the unpadded size is identical to the maximum frame size, and the maximum size is a multiplum of the USB packet size, then this one-byte padding will overflow the buffer. Prevent padding if already at maximum frame size, letting usbnet transmit a ZLP instead in this case. Fixes: 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs") Reported by: Yu-an Shih Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1b3ac8488ee4a5dd42b6c1809ca8e80fb64c549c Author: Andy King Date: Thu May 1 15:20:43 2014 -0700 vsock: Make transport the proto owner [ Upstream commit 2c4a336e0a3e203fab6aa8d8f7bb70a0ad968a6b ] Right now the core vsock module is the owner of the proto family. This means there's nothing preventing the transport module from unloading if there are open sockets, which results in a panic. Fix that by allowing the transport to be the owner, which will refcount it properly. Includes version bump to 1.0.1.0-k Passes checkpatch this time, I swear... Acked-by: Dmitry Torokhov Signed-off-by: Andy King Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ab7ba76731a124ef9a70b07004c63b09e260b0d4 Author: John Fastabend Date: Thu May 1 09:23:06 2014 -0700 net: sched: lock imbalance in hhf qdisc [ Upstream commit f6a082fed1e6407c2f4437d0d963b1bcbe5f9f58 ] hhf_change() takes the sch_tree_lock and releases it but misses the error cases. Fix the missed case here. To reproduce try a command like this, # tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000 Signed-off-by: John Fastabend Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b2bb9ffdced59893208af375086e0cdb17a8d3c3 Author: Liu Yu Date: Wed Apr 30 17:34:09 2014 +0800 tcp_cubic: fix the range of delayed_ack [ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ] commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger Signed-off-by: Liu Yu Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cea2f34c5fa7f414f648fc7034ce76d5eb0d48ff Author: Vlad Yasevich Date: Tue Apr 29 10:09:51 2014 -0400 Revert "macvlan : fix checksums error when we are in bridge mode" [ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ] This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019. The commit above doesn't appear to be necessary any more as the checksums appear to be correctly computed/validated. Additionally the above commit breaks kvm configurations where one VM is using a device that support checksum offload (virtio) and the other VM does not. In this case, packets leaving virtio device will have CHECKSUM_PARTIAL set. The packets is forwarded to a macvtap that has offload features turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not update the checksum and thus a bad checksum is passed up to the guest. CC: Daniel Lezcano CC: Patrick McHardy CC: Andrian Nord CC: Eric Dumazet CC: Michael S. Tsirkin CC: Jason Wang Signed-off-by: Vlad Yasevich Acked-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bdccb452699c0da6df5538904bbf4847de1cf4fc Author: Vlad Yasevich Date: Tue Apr 29 10:09:50 2014 -0400 mactap: Fix checksum errors for non-gso packets in bridge mode [ Upstream commit cbdb04279ccaefcc702c8757757eea8ed76e50cf ] The following is a problematic configuration: VM1: virtio-net device connected to macvtap0@eth0 VM2: e1000 device connect to macvtap1@eth0 The problem is is that virtio-net supports checksum offloading and thus sends the packets to the host with CHECKSUM_PARTIAL set. On the other hand, e1000 does not support any acceleration. For small TCP packets (and this includes the 3-way handshake), e1000 ends up receiving packets that only have a partial checksum set. This causes TCP to fail checksum validation and to drop packets. As a result tcp connections can not be established. Commit 3e4f8b787370978733ca6cae452720a4f0c296b8 macvtap: Perform GSO on forwarding path. fixes this issue for large packets wthat will end up undergoing GSO. This commit adds a check for the non-GSO case and attempts to compute the checksum for partially checksummed packets in the non-GSO case. CC: Daniel Lezcano CC: Patrick McHardy CC: Andrian Nord CC: Eric Dumazet CC: Michael S. Tsirkin CC: Jason Wang Signed-off-by: Vlad Yasevich Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 987c76c0af87c57566bd9cb99e7a9f04fcd949be Author: Karl Heiss Date: Fri Apr 25 14:26:30 2014 -0400 net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. [ Upstream commit 8c2eab9097dba50bcd73ed4632baccc3f34857f9 ] Don't transition to the PF state on every strike after 'Path.Max.Retrans'. Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6: Additional (PMR - PFMR) consecutive timeouts on a PF destination confirm the path failure, upon which the destination transitions to the Inactive state. As described in [RFC4960], the sender (i) SHOULD notify ULP about this state transition, and (ii) transmit heartbeats to the Inactive destination at a lower frequency as described in Section 8.3 of [RFC4960]. This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike. Signed-off-by: Karl Heiss Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0e9d5992216824b86c0faa074f32c0d1ca0e004b Author: Oliver Hartkopp Date: Sat Apr 26 21:18:32 2014 +0200 slip: fix spinlock variant [ Upstream commit ddcde142bed44490e338ed1124cb149976d355bb ] With commit cc9fa74e2a ("slip/slcan: added locking in wakeup function") a formerly missing locking was added to slip.c and slcan.c by Andre Naujoks. Alexander Stein contributed the fix 367525c8c2 ("can: slcan: Fix spinlock variant") as the kernel lock debugging advised to use spin_lock_bh() instead of just using spin_lock(). This fix has to be applied to the same code section in slip.c for the same reason too. Signed-off-by: Oliver Hartkopp Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 21ff593721746f06ccf42148a8fb6bb0314d5fbb Author: Bjørn Mork Date: Fri Apr 25 19:00:34 2014 +0200 net: qmi_wwan: add a number of Dell devices [ Upstream commit 6f10c5d1b1aeddb63d33070abb8bc5a177beeb1f ] Dan writes: "The Dell drivers use the same configuration for PIDs: 81A2: Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card 81A3: Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card 81A4: Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card 81A8: Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card 81A9: Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card These devices are all clearly Sierra devices, but are also definitely Gobi-based. The A8 might be the MC7700/7710 and A9 is likely a MC7750. >From DellGobi5kSetup.exe from the Dell drivers: usbif0: serial/firmware loader? usbif2: nmea usbif3: modem/ppp usbif8: net/QMI" Reported-by: AceLan Kao Reported-by: Dan Williams Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fa395b0f8b209af2f1d8a138390ebfe4177a4614 Author: Bjørn Mork Date: Fri Apr 25 19:00:33 2014 +0200 net: qmi_wwan: add a number of CMOTech devices [ Upstream commit 41be7d90993b1502d445bfc59e58348c258ce66a ] A number of older CMOTech modems are based on Qualcomm chips and exporting a QMI/wwan function. Reported-by: Lars Melin Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 95c76ef3795addba03b8f150ba8dbd30fe4ecc24 Author: Bjørn Mork Date: Fri Apr 25 19:00:32 2014 +0200 net: qmi_wwan: add Alcatel L800MA [ Upstream commit 75573660c47a0db7cc931dcf154945610e02130a ] Device interface layout: 0: ff/ff/ff - serial 1: ff/00/00 - serial AT+PPP 2: ff/ff/ff - QMI/wwan 3: 08/06/50 - storage Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f165a8896c49d5edaa232f2303556bf695943a91 Author: Bjørn Mork Date: Fri Apr 25 19:00:31 2014 +0200 net: qmi_wwan: add Olivetti Olicard 500 [ Upstream commit efc0b25c3add97717ece57bf5319792ca98f348e ] Device interface layout: 0: ff/ff/ff - serial 1: ff/ff/ff - serial AT+PPP 2: 08/06/50 - storage 3: ff/ff/ff - serial 4: ff/ff/ff - QMI/wwan Reported-by: Julio Araujo Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8d5bb0cc4daa9d4d2b8e416fc70584759d1dbb4f Author: Bjørn Mork Date: Fri Apr 25 19:00:30 2014 +0200 net: qmi_wwan: add Sierra Wireless MC7305/MC7355 [ Upstream commit 9214224e43e4264b02686ea8b455f310935607b5 ] Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 88724e6b0d37b480a84060789651d16365c78eba Author: Bjørn Mork Date: Fri Apr 25 19:00:29 2014 +0200 net: qmi_wwan: add Sierra Wireless MC73xx [ Upstream commit 1c138607a7be64074d7fba68d0d533ec38f9d17b ] Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d2c3d4f3bcfd826546cabcc1850b1a14ab2b4749 Author: Bjørn Mork Date: Fri Apr 25 19:00:28 2014 +0200 net: qmi_wwan: add Sierra Wireless EM7355 [ Upstream commit b85f5deaf052340021d025e120a9858f084a1d79 ] Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 26d8db95df9a5d799978e5ae0b9574c79a5eae3b Author: Xufeng Zhang Date: Fri Apr 25 16:55:41 2014 +0800 sctp: reset flowi4_oif parameter on route lookup [ Upstream commit 85350871317a5adb35519d9dc6fc9e80809d42ad ] commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit e6b45241c (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang Signed-off-by: Julian Anastasov Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4a68caa1b3d3f382bde56a6a3d3430bcb53ecf7f Author: Toshiaki Makita Date: Fri Apr 25 17:01:18 2014 +0900 bridge: Handle IFLA_ADDRESS correctly when creating bridge device [ Upstream commit 30313a3d5794472c3548d7288e306a5492030370 ] When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen Signed-off-by: Toshiaki Makita Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 321dbc0d61b02e47f0159e8f194ab25b85bae04d Author: Kumar Sundararajan Date: Thu Apr 24 09:48:53 2014 -0400 ipv6: fib: fix fib dump restart [ Upstream commit 1c2658545816088477e91860c3a645053719cb54 ] When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan Signed-off-by: Chris Mason Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 657cad06d60c2c7f032257125683627fb5480d21 Author: David Gibson Date: Thu Apr 24 10:22:36 2014 +1000 rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set [ Upstream commit c53864fd60227de025cb79e05493b13f69843971 ] Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 22d964f55e6d59186415fc017b8095992805c916 Author: David Gibson Date: Thu Apr 24 10:22:35 2014 +1000 rtnetlink: Warn when interface's information won't fit in our packet [ Upstream commit 973462bbde79bb827824c73b59027a0aed5c9ca6 ] Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 265bcb0ea1205296273412e8a7051756d5a9d61f Author: Andrew Lutomirski Date: Wed Apr 16 21:41:34 2014 -0700 net: Fix ns_capable check in sock_diag_put_filterinfo [ Upstream commit 78541c1dc60b65ecfce5a6a096fc260219d6784e ] The caller needs capabilities on the namespace being queried, not on their own namespace. This is a security bug, although it likely has only a minor impact. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski Acked-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3938b0336a93fa5faa242dc9e5823ac69df9e066 Author: Vlad Yasevich Date: Thu Apr 17 17:26:50 2014 +0200 net: sctp: cache auth_enable per endpoint [ Upstream commit b14878ccb7fac0242db82720b784ab62c467c0dc ] Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [] sctp_auth_asoc_set_default_hmac+0x68/0x80 [] sctp_process_init+0x5e0/0x8a4 [] sctp_sf_do_5_1B_init+0x234/0x34c [] sctp_do_sm+0xb4/0x1e8 [] sctp_endpoint_bh_rcv+0x1c4/0x214 [] sctp_rcv+0x588/0x630 [] sctp6_rcv+0x10/0x24 [] ip6_input+0x2c0/0x440 [] __netif_receive_skb_core+0x4a8/0x564 [] process_backlog+0xb4/0x18c [] net_rx_action+0x12c/0x210 [] __do_softirq+0x17c/0x2ac [] irq_exit+0x54/0xb0 [] ret_from_irq+0x0/0x4 [] rm7k_wait_irqoff+0x24/0x48 [] cpu_startup_entry+0xc0/0x148 [] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard Signed-off-by: Vlad Yasevich Signed-off-by: Daniel Borkmann Acked-by: Neil Horman Tested-by: Joshua Kinard Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e91a0ef7697860eaea376bb2acdee7427866ac62 Author: Ivan Vecera Date: Thu Apr 17 14:51:08 2014 +0200 tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled The patch fixes a problem with dropped jumbo frames after usage of 'ethtool -G ... rx'. Scenario: 1. ip link set eth0 up 2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo 3. ip link set mtu 9000 dev eth0 The ethtool command set rx_jumbo_pending to zero so any received jumbo packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N' to workaround the issue. The patch changes the logic so rx_jumbo_pending value is changed only if jumbo frames are enabled (MTU > 1500). Signed-off-by: Ivan Vecera Acked-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1c030bf16f8c157ef227ee598ac40ff75fb7ec9c Author: Vlad Yasevich Date: Fri May 16 17:04:56 2014 -0400 macvlan: Fix lockdep warnings with stacked macvlan devices [ Upstream commit c674ac30c549596295eb0a5af7f4714c0b905b6f ] Macvlan devices try to avoid stacking, but that's not always successfull or even desired. As an example, the following configuration is perefectly legal and valid: eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1 However, this configuration produces the following lockdep trace: [ 115.620418] ====================================================== [ 115.620477] [ INFO: possible circular locking dependency detected ] [ 115.620516] 3.15.0-rc1+ #24 Not tainted [ 115.620540] ------------------------------------------------------- [ 115.620577] ip/1704 is trying to acquire lock: [ 115.620604] (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_uc_sync+0x3c/0x80 [ 115.620686] but task is already holding lock: [ 115.620723] (&macvlan_netdev_addr_lock_key){+.....}, at: [] dev_set_rx_mode+0x1e/0x40 [ 115.620795] which lock already depends on the new lock. [ 115.620853] the existing dependency chain (in reverse order) is: [ 115.620894] -> #1 (&macvlan_netdev_addr_lock_key){+.....}: [ 115.620935] [] lock_acquire+0xa2/0x130 [ 115.620974] [] _raw_spin_lock_nested+0x37/0x50 [ 115.621019] [] vlan_dev_set_rx_mode+0x53/0x110 [8021q] [ 115.621066] [] __dev_set_rx_mode+0x57/0xa0 [ 115.621105] [] dev_set_rx_mode+0x26/0x40 [ 115.621143] [] __dev_open+0xde/0x140 [ 115.621174] [] __dev_change_flags+0x9d/0x170 [ 115.621174] [] dev_change_flags+0x29/0x60 [ 115.621174] [] do_setlink+0x321/0x9a0 [ 115.621174] [] rtnl_newlink+0x51f/0x730 [ 115.621174] [] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [] netlink_unicast+0xf0/0x1c0 [ 115.621174] [] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [] sock_sendmsg+0x8b/0xc0 [ 115.621174] [] ___sys_sendmsg+0x369/0x380 [ 115.621174] [] __sys_sendmsg+0x42/0x80 [ 115.621174] [] SyS_sendmsg+0x12/0x20 [ 115.621174] [] system_call_fastpath+0x16/0x1b [ 115.621174] -> #0 (&vlan_netdev_addr_lock_key/1){+.....}: [ 115.621174] [] __lock_acquire+0x1773/0x1a60 [ 115.621174] [] lock_acquire+0xa2/0x130 [ 115.621174] [] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [] dev_uc_sync+0x3c/0x80 [ 115.621174] [] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [] dev_set_rx_mode+0x26/0x40 [ 115.621174] [] __dev_open+0xde/0x140 [ 115.621174] [] __dev_change_flags+0x9d/0x170 [ 115.621174] [] dev_change_flags+0x29/0x60 [ 115.621174] [] do_setlink+0x321/0x9a0 [ 115.621174] [] rtnl_newlink+0x51f/0x730 [ 115.621174] [] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [] netlink_unicast+0xf0/0x1c0 [ 115.621174] [] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [] sock_sendmsg+0x8b/0xc0 [ 115.621174] [] ___sys_sendmsg+0x369/0x380 [ 115.621174] [] __sys_sendmsg+0x42/0x80 [ 115.621174] [] SyS_sendmsg+0x12/0x20 [ 115.621174] [] system_call_fastpath+0x16/0x1b [ 115.621174] other info that might help us debug this: [ 115.621174] Possible unsafe locking scenario: [ 115.621174] CPU0 CPU1 [ 115.621174] ---- ---- [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] *** DEADLOCK *** [ 115.621174] 2 locks held by ip/1704: [ 115.621174] #0: (rtnl_mutex){+.+.+.}, at: [] rtnetlink_rcv+0x1b/0x40 [ 115.621174] #1: (&macvlan_netdev_addr_lock_key){+.....}, at: [] dev_set_rx_mode+0x1e/0x40 [ 115.621174] stack backtrace: [ 115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24 [ 115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010 [ 115.621174] ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0 [ 115.621174] ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8 [ 115.621174] 0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230 [ 115.621174] Call Trace: [ 115.621174] [] dump_stack+0x4d/0x66 [ 115.621174] [] print_circular_bug+0x200/0x20e [ 115.621174] [] __lock_acquire+0x1773/0x1a60 [ 115.621174] [] ? trace_hardirqs_on_caller+0xb2/0x1d0 [ 115.621174] [] lock_acquire+0xa2/0x130 [ 115.621174] [] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [] dev_uc_sync+0x3c/0x80 [ 115.621174] [] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [] dev_set_rx_mode+0x26/0x40 [ 115.621174] [] __dev_open+0xde/0x140 [ 115.621174] [] __dev_change_flags+0x9d/0x170 [ 115.621174] [] dev_change_flags+0x29/0x60 [ 115.621174] [] ? mem_cgroup_bad_page_check+0x21/0x30 [ 115.621174] [] do_setlink+0x321/0x9a0 [ 115.621174] [] ? __lock_acquire+0x37c/0x1a60 [ 115.621174] [] rtnl_newlink+0x51f/0x730 [ 115.621174] [] ? rtnl_newlink+0xe9/0x730 [ 115.621174] [] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [] ? trace_hardirqs_on+0xd/0x10 [ 115.621174] [] ? rtnetlink_rcv+0x1b/0x40 [ 115.621174] [] ? rtnetlink_rcv+0x40/0x40 [ 115.621174] [] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [] netlink_unicast+0xf0/0x1c0 [ 115.621174] [] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [] sock_sendmsg+0x8b/0xc0 [ 115.621174] [] ? might_fault+0x5f/0xb0 [ 115.621174] [] ? might_fault+0xa8/0xb0 [ 115.621174] [] ? might_fault+0x5f/0xb0 [ 115.621174] [] ? verify_iovec+0x5e/0xe0 [ 115.621174] [] ___sys_sendmsg+0x369/0x380 [ 115.621174] [] ? __do_page_fault+0x11d/0x570 [ 115.621174] [] ? up_read+0x1f/0x40 [ 115.621174] [] ? __do_page_fault+0x214/0x570 [ 115.621174] [] ? mntput_no_expire+0x6b/0x1c0 [ 115.621174] [] ? mntput_no_expire+0x17/0x1c0 [ 115.621174] [] ? mntput+0x24/0x40 [ 115.621174] [] __sys_sendmsg+0x42/0x80 [ 115.621174] [] SyS_sendmsg+0x12/0x20 [ 115.621174] [] system_call_fastpath+0x16/0x1b Fix this by correctly providing macvlan lockdep class. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 102e103f76a52e68f2169b86f0536debaf65d548 Author: Vlad Yasevich Date: Fri May 16 17:04:55 2014 -0400 vlan: Fix lockdep warning with stacked vlan devices. [ Upstream commit d38569ab2bba6e6b3233acfc3a84cdbcfbd1f79f ] This reverts commit dc8eaaa006350d24030502a4521542e74b5cb39f. vlan: Fix lockdep warning when vlan dev handle notification Instead we use the new new API to find the lock subclass of our vlan device. This way we can support configurations where vlans are interspersed with other devices: bond -> vlan -> macvlan -> vlan Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d0166f814a1daef5992c19d5c18f2860e17ad2f1 Author: Vlad Yasevich Date: Fri May 16 17:04:54 2014 -0400 net: Allow for more then a single subclass for netif_addr_lock [ Upstream commit 25175ba5c9bff9aaf0229df34bb5d54c81633ec3 ] Currently netif_addr_lock_nested assumes that there can be only a single nesting level between 2 devices. However, if we have multiple devices of the same type stacked, this fails. For example: eth0 <-- vlan0.10 <-- vlan0.10.20 A more complicated configuration may stack more then one type of device in different order. Ex: eth0 <-- vlan0.10 <-- macvlan0 <-- vlan1.10.20 <-- macvlan1 This patch adds an ndo_* function that allows each stackable device to report its nesting level. If the device doesn't provide this function default subclass of 1 is used. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 69ab2a8b80f2a479df6728effb935174dd5303bf Author: Vlad Yasevich Date: Fri May 16 17:04:53 2014 -0400 net: Find the nesting level of a given device by type. [ Upstream commit 4085ebe8c31face855fd01ee40372cb4aab1df3a ] Multiple devices in the kernel can be stacked/nested and they need to know their nesting level for the purposes of lockdep. This patch provides a generic function that determines a nesting level of a particular device by its type (ex: vlan, macvlan, etc). We only care about nesting of the same type of devices. For example: eth0 <- vlan0.10 <- macvlan0 <- vlan1.20 The nesting level of vlan1.20 would be 1, since there is another vlan in the stack under it. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e2c7f968588cdafe7cca360acd9665e51a1165d8 Author: dingtianhong Date: Thu Apr 17 18:40:36 2014 +0800 vlan: Fix lockdep warning when vlan dev handle notification [ Upstream commit dc8eaaa006350d24030502a4521542e74b5cb39f ] When I open the LOCKDEP config and run these steps: modprobe 8021q vconfig add eth2 20 vconfig add eth2.20 30 ifconfig eth2 xx.xx.xx.xx then the Call Trace happened: [32524.386288] ============================================= [32524.386293] [ INFO: possible recursive locking detected ] [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O [32524.386302] --------------------------------------------- [32524.386306] ifconfig/3103 is trying to acquire lock: [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_mc_sync+0x64/0xb0 [32524.386326] [32524.386326] but task is already holding lock: [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_set_rx_mode+0x23/0x40 [32524.386341] [32524.386341] other info that might help us debug this: [32524.386345] Possible unsafe locking scenario: [32524.386345] [32524.386350] CPU0 [32524.386352] ---- [32524.386354] lock(&vlan_netdev_addr_lock_key/1); [32524.386359] lock(&vlan_netdev_addr_lock_key/1); [32524.386364] [32524.386364] *** DEADLOCK *** [32524.386364] [32524.386368] May be due to missing lock nesting notation [32524.386368] [32524.386373] 2 locks held by ifconfig/3103: [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x20 [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_set_rx_mode+0x23/0x40 [32524.386398] [32524.386398] stack backtrace: [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35 [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8 [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0 [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000 [32524.386435] Call Trace: [32524.386441] [] dump_stack+0x6a/0x78 [32524.386448] [] __lock_acquire+0x7ab/0x1940 [32524.386454] [] ? __lock_acquire+0x3ea/0x1940 [32524.386459] [] lock_acquire+0xe4/0x110 [32524.386464] [] ? dev_mc_sync+0x64/0xb0 [32524.386471] [] _raw_spin_lock_nested+0x2a/0x40 [32524.386476] [] ? dev_mc_sync+0x64/0xb0 [32524.386481] [] dev_mc_sync+0x64/0xb0 [32524.386489] [] vlan_dev_set_rx_mode+0x2b/0x50 [8021q] [32524.386495] [] __dev_set_rx_mode+0x5f/0xb0 [32524.386500] [] dev_set_rx_mode+0x2b/0x40 [32524.386506] [] __dev_open+0xef/0x150 [32524.386511] [] __dev_change_flags+0xa7/0x190 [32524.386516] [] dev_change_flags+0x32/0x80 [32524.386524] [] devinet_ioctl+0x7d6/0x830 [32524.386532] [] ? dev_ioctl+0x34b/0x660 [32524.386540] [] inet_ioctl+0x80/0xa0 [32524.386550] [] sock_do_ioctl+0x2d/0x60 [32524.386558] [] sock_ioctl+0x82/0x2a0 [32524.386568] [] do_vfs_ioctl+0x93/0x590 [32524.386578] [] ? rcu_read_lock_held+0x45/0x50 [32524.386586] [] ? __fget_light+0x105/0x110 [32524.386594] [] SyS_ioctl+0x91/0xb0 [32524.386604] [] system_call_fastpath+0x16/0x1b ======================================================================== The reason is that all of the addr_lock_key for vlan dev have the same class, so if we change the status for vlan dev, the vlan dev and its real dev will hold the same class of addr_lock_key together, so the warning happened. we should distinguish the lock depth for vlan dev and its real dev. v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which could support to add 8 vlan id on a same vlan dev, I think it is enough for current scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id, and the vlan dev would not meet the same class key with its real dev. The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan dev could get a suitable class key. v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev and its real dev, but it make no sense, because the difference for subclass in the lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth to distinguish the different depth for every vlan dev, the same depth of the vlan dev could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h, I think it is enough here, the lockdep should never exceed that value. v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method, we could use _nested() variants to fix the problem, calculate the depth for every vlan dev, and use the depth as the subclass for addr_lock_key. Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 846a8da8c1e891ac0201a90739b8f3275bcf6e8f Author: Nicolas Dichtel Date: Mon Apr 14 17:11:38 2014 +0200 ip6_gre: don't allow to remove the fb_tunnel_dev [ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ] It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9038f47699cad8883196ac57bae36418ab60d60c Author: Vlad Yasevich Date: Mon Apr 14 17:37:26 2014 -0400 net: Start with correct mac_len in skb_network_protocol [ Upstream commit 1e785f48d29a09b6cf96db7b49b6320dada332e1 ] Sometimes, when the packet arrives at skb_mac_gso_segment() its skb->mac_len already accounts for some of the mac lenght headers in the packet. This seems to happen when forwarding through and OpenSSL tunnel. When we start looking for any vlan headers in skb_network_protocol() we seem to ignore any of the already known mac headers and start with an ETH_HLEN. This results in an incorrect offset, dropped TSO frames and general slowness of the connection. We can start counting from the known skb->mac_len and return at least that much if all mac level headers are known and accounted for. Fixes: 53d6471cef17262d3ad1c7ce8982a234244f68ec (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet CC: Daniel Borkman Tested-by: Martin Filip Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bde6d78b4a05212e6611c7c2fe104fa72512b6eb Author: Daniel Borkmann Date: Mon Apr 14 21:45:17 2014 +0200 Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" [ Upstream commit 362d52040c71f6e8d8158be48c812d7729cb8df1 ] This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler Reported-by: Dongsheng Song Reported-by: Fengguang Wu Tested-by: Peter Butler Signed-off-by: Daniel Borkmann Cc: Matija Glavinic Pecotic Cc: Alexander Sverdlin Cc: Vlad Yasevich Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 72beb5639e05f940135ed00bc2fa0200d62074a6 Author: Mathias Krause Date: Sun Apr 13 18:23:33 2014 +0200 filter: prevent nla extensions to peek beyond the end of the message [ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy Cc: Pablo Neira Ayuso Signed-off-by: Mathias Krause Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 25ec5dea4388354978acb3823bb475a84219f36b Author: Julian Anastasov Date: Sun Apr 13 18:08:02 2014 +0300 ipv4: return valid RTA_IIF on ip route get [ Upstream commit 91146153da2feab18efab2e13b0945b6bb704ded ] Extend commit 13378cad02afc2adc6c0e07fca03903c7ada0b37 ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2089f7346ae535fa444853426adfafef25107247 Author: Wang, Xiaoming Date: Mon Apr 14 12:30:45 2014 -0400 net: ipv4: current group_info should be put after using. [ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ] Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu Signed-off-by: Zhang Dongxing Signed-off-by: xiaoming wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f39b7458abd594e588350868f027af36a4ff3b73 Author: Nicolas Dichtel Date: Fri Apr 11 15:51:19 2014 +0200 vti: don't allow to add the same tunnel twice [ Upstream commit 8d89dcdf80d88007647945a753821a06eb6cc5a5 ] Before the patch, it was possible to add two times the same tunnel: ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41 ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code"). CC: Cong Wang CC: Steffen Klassert Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 920fd3b059e73e2a89e379d165555a499cc36107 Author: Nicolas Dichtel Date: Fri Apr 11 15:51:18 2014 +0200 gre: don't allow to add the same tunnel twice [ Upstream commit 5a4552752d8f7f4cef1d98775ece7adb7616fde2 ] Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6c09ba984cbb626b44af1b028ed27cdf9a8dd74f Author: Eric Dumazet Date: Thu Apr 10 21:23:36 2014 -0700 ipv6: Limit mtu to 65575 bytes [ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER Signed-off-by: Eric Dumazet Acked-by: YOSHIFUJI Hideaki Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d526f28383673eb871ea383fce884d867cdd6bb9 Author: Toshiaki Makita Date: Wed Apr 9 17:00:30 2014 +0900 bridge: Fix double free and memory leak around br_allowed_ingress [ Upstream commit eb7076182d1ae4bc4641534134ed707100d76acc ] br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cfe1592425717a19757afa497e1f2299352322a2 Author: Thomas Richter Date: Wed Apr 9 12:52:59 2014 +0200 bonding: Remove debug_fs files when module init fails [ Upstream commit db29868653394937037d71dc3545768302dda643 ] Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 80f46884ae09ee9cb7f1ca4368aca350953bb7e9 Author: Florian Westphal Date: Wed Apr 9 10:28:50 2014 +0200 net: core: don't account for udp header size when computing seglen [ Upstream commit 6d39d589bb76ee8a1c6cde6822006ae0053decff ] In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet Reported-by: Tobias Brunner Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 55c7f1333a119001b21f9fa0e650827d96d05b08 Author: Dmitry Petukhov Date: Wed Apr 9 02:23:20 2014 +0600 l2tp: take PMTU from tunnel UDP socket [ Upstream commit f34c4a35d87949fbb0e0f31eba3c054e9f8199ba ] When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 42c15611b670e03185f78bfdc8b6eaa3596b1049 Author: Daniel Borkmann Date: Wed Apr 9 16:10:20 2014 +0200 net: sctp: test if association is dead in sctp_wake_up_waiters [ Upstream commit 1e1cdf8ac78793e0875465e98a648df64694a8d0 ] In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann Cc: Vlad Yasevich Acked-by: Neil Horman Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a4800639980844fbcfe6ec8c89dca0934dce513d Author: Daniel Borkmann Date: Tue Apr 8 17:26:13 2014 +0200 net: sctp: wake up all assocs if sndbuf policy is per socket [ Upstream commit 52c35befb69b005c3fc5afdaae3a5717ad013411 ] SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann Cc: Thomas Graf Cc: Neil Horman Cc: Vlad Yasevich Acked-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 70d1e53b3b1f06cdf282e6753ffb11a95d5a6854 Author: Steven Rostedt Date: Tue Mar 18 11:27:37 2014 -0400 drm/i915: Do not dereference pointers from ring buffer in evict event commit 9297ebf29ad9118edd6c0fedc84f03e35028827d upstream. The TP_printk() should never dereference any pointers, because the ring buffer can be read at some unknown time in the future. If a device no longer exists, it can cause a kernel oops. This also makes this event useless when saving the ring buffer in userspaces tools such as perf and trace-cmd. The i915_gem_evict_vm dereferences the vm pointer which may also not exist when the ring buffer is read sometime in the future. Link: http://lkml.kernel.org/r/1395095198-20034-3-git-send-email-artagnon@gmail.com Reported-by: Ramkumar Ramachandra Fixes: bcccff847d1f "drm/i915: trace vm eviction instead of everything" Signed-off-by: Steven Rostedt [danvet: Try to make it actually compile] Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 6dee738af60144c03cbfd0699bbb99f4ff4e021a Author: Jani Nikula Date: Fri Mar 28 08:54:04 2014 +0200 drm/i915/tv: fix gen4 composite s-video tv-out commit e1f23f3dd817f53f622e486913ac662add46eeed upstream. This is *not* bisected, but the likely regression is commit c35614380d5c956bfda20eab2755b2f5a7d6f1e7 Author: Zhao Yakui Date: Tue Nov 24 09:48:48 2009 +0800 drm/i915: Don't set up the TV port if it isn't in the BIOS table. The commit does not check for all TV device types that might be present in the VBT, disabling TV out for the missing ones. Add composite S-video. Reported-and-tested-by: Matthew Khouzam Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73362 Signed-off-by: Jani Nikula Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit 711bdfebba024019d77b2c3dead7f6599c527f5e Author: Alex Deucher Date: Wed Apr 2 08:42:49 2014 -0400 drm/radeon: fix typo in spectre_golden_registers commit f1553174a207f68a4ec19d436003097e0a4dc405 upstream. Signed-off-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Greg Kroah-Hartman commit 4f012fddb9ff166d58aa69aa90b23732b6624aa2 Author: Alex Deucher Date: Wed Apr 2 08:42:48 2014 -0400 drm/radeon: fix endian swap on hawaii clear state buffer setup commit a8947f576728a66bd3aac629bd8ca021a010c808 upstream. Need to swap on BE. Signed-off-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Greg Kroah-Hartman commit 1b91811383138a601c75752508aeec1d9541d593 Author: Alex Deucher Date: Mon Mar 31 11:19:46 2014 -0400 drm/radeon: call drm_edid_to_eld when we update the edid commit 16086279353cbfecbb3ead474072dced17b97ddc upstream. This needs to be done to update some of the fields in the connector structure used by the audio code. Noticed by several users on irc. Signed-off-by: Alex Deucher Signed-off-by: Christian König Signed-off-by: Greg Kroah-Hartman commit 42ef3a8d5ebaa2bd9308a32696c96843caebaf17 Author: Christian König Date: Tue Mar 25 11:41:40 2014 +0100 drm/radeon: clear needs_reset flag if IB test fails commit 06a139f7a0885fa2c84962300edd181821ddc2c9 upstream. If the IB test fails we don't want to reset the card over and over again, just accept that it isn't working. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=76501 Signed-off-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 255cd6032a99706de5566ab7a504224076060756 Author: Maarten Lankhorst Date: Tue Apr 1 15:15:47 2014 +0200 drm/qxl: unset a pointer in sync_obj_unref commit 41ccec352f3c823931a7d9d2a9c7880c14d7415a upstream. This fixes a BUG_ON(bo->sync_obj != NULL); in ttm_bo_release_list. Signed-off-by: Maarten Lankhorst Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman commit f8df1b4619aaed7a9d45cd5b7a84b854d7f46e4b Author: Thomas Hellstrom Date: Tue Apr 15 18:25:48 2014 +0200 drm/vmwgfx: Make sure user-space can't DMA across buffer object boundaries v2 commit cbd75e97a525e3819c02dc18bc2d67aa544c9e45 upstream. We already check that the buffer object we're accessing is registered with the file. Now also make sure that we can't DMA across buffer object boundaries. v2: Code commenting update. Signed-off-by: Thomas Hellstrom Reviewed-by: Jakob Bornecrantz Signed-off-by: Greg Kroah-Hartman commit 900bbb9d90c65a8fa4efa9dca20f5c780e698bae Author: Thomas Hellstrom Date: Mon Mar 31 10:20:30 2014 +0200 drm/vmwgfx: Fix query buffer locking order violation commit c8e5e010ef12df6707a1d711a5279a22f67a355e upstream. The query buffers were reserved while holding the binding mutex, which caused a circular locking dependency. Signed-off-by: Thomas Hellstrom Reviewed-by: Brian Paul Signed-off-by: Greg Kroah-Hartman commit 558399b4fa247e9841056bcc0116be6755fcc1ba Author: Christopher Friedt Date: Sat Feb 1 10:01:15 2014 -0500 drm/vmwgfx: correct fb_fix_screeninfo.line_length commit aa6de142c901cd2d90ef08db30ae87da214bedcc upstream. Previously, the vmwgfx_fb driver would allow users to call FBIOSET_VINFO, but it would not adjust the FINFO properly, resulting in distorted screen rendering. The patch corrects that behaviour. See https://bugs.gentoo.org/show_bug.cgi?id=494794 for examples. Signed-off-by: Christopher Friedt Reviewed-by: Thomas Hellstrom Signed-off-by: Greg Kroah-Hartman commit 9ad81155697312801f7b7ff60d1aa42bbbfd6ffb Author: Eliad Peller Date: Sun Apr 13 16:33:51 2014 +0300 wl18xx: align event mailbox with current fw commit c0da71ff4d2cbf113465bff9a7c413154be25a89 upstream. Some fields are missing from the event mailbox struct definitions, which cause issues when trying to handle some events. Add the missing fields in order to align the struct size (without adding actual support for the new fields). Reported-and-tested-by: Imre Kaloz Fixes: 028e724 ("wl18xx: move to new firmware (wl18xx-fw-3.bin)") Signed-off-by: Eliad Peller Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 47ce856c8f85530d6586562f982493dae2a4423e Author: Thomas Bächler Date: Thu Apr 3 21:55:37 2014 +0200 fs: Don't return 0 from get_anon_bdev commit a2a4dc494a7b7135f460e38e788c4a58f65e4ac3 upstream. Commit 9e30cc9595303b27b48 removed an internal mount. This has the side-effect that rootfs now has FSID 0. Many userspace utilities assume that st_dev in struct stat is never 0, so this change breaks a number of tools in early userspace. Since we don't know how many userspace programs are affected, make sure that FSID is at least 1. References: http://article.gmane.org/gmane.linux.kernel/1666905 References: http://permalink.gmane.org/gmane.linux.utilities.util-linux-ng/8557 Signed-off-by: Thomas Bächler Acked-by: Tejun Heo Acked-by: H. Peter Anvin Tested-by: Alexandre Demers Signed-off-by: Greg Kroah-Hartman commit 2ad1de98b6fc6c7d04b00506e77b2aa1f7838c01 Author: Chris Mason Date: Tue Apr 15 18:09:24 2014 -0400 mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll commit c98235cb8584a72e95786e17d695a8e5fafcd766 upstream. The mlx4 driver is triggering schedules while atomic inside mlx4_en_netpoll: spin_lock_irqsave(&cq->lock, flags); napi_synchronize(&cq->napi); ^^^^^ msleep here mlx4_en_process_rx_cq(dev, cq, 0); spin_unlock_irqrestore(&cq->lock, flags); This was part of a patch by Alexander Guller from Mellanox in 2011, but it still isn't upstream. Signed-off-by: Chris Mason Acked-By: Amir Vadai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bee874114a1b87b0860b9cea170536df9d8b2042 Author: Tony Lindgren Date: Tue Mar 25 11:48:47 2014 -0700 serial: omap: Fix missing pm_runtime_resume handling by simplifying code commit d758c9c1b36b4d9a141c2146c70398d756167ed1 upstream. The lack of pm_runtime_resume handling for the device state leads into device wake-up interrupts not working after a while for runtime PM. Also, serial-omap is confused about the use of device_may_wakeup. The checks for device_may_wakeup should only be done for suspend and resume, not for pm_runtime_suspend and pm_runtime_resume. The wake-up events for PM runtime should always be enabled. The lack of pm_runtime_resume handling leads into device wake-up interrupts not working after a while for runtime PM. Rather than try to patch over the issue of adding complex tests to the pm_runtime_resume, let's fix the issues properly: 1. Make serial_omap_enable_wakeup deal with all internal PM state handling so we don't need to test for up->wakeups_enabled elsewhere. Later on once omap3 boots in device tree only mode we can also remove the up->wakeups_enabled flag and rely on the wake-up interrupt enable/disable state alone. 2. Do the device_may_wakeup checks in suspend and resume only, for runtime PM the wake-up events need to be always enabled. 3. Finally just call serial_omap_enable_wakeup and make sure we call it also in pm_runtime_resume. 4. Note that we also have to use disable_irq_nosync as serial_omap_irq calls pm_runtime_get_sync. Fixes: 2a0b965cfb6e (serial: omap: Add support for optional wake-up) Signed-off-by: Tony Lindgren Acked-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit 91144f7f183ad1b98afc8b40491c555f93c37690 Author: Bjørn Mork Date: Fri Apr 25 18:49:20 2014 +0200 usb: option: add and update a number of CMOTech devices commit 34f972d6156fe9eea2ab7bb418c71f9d1d5c8e7b upstream. A number of older CMOTech modems are based on Qualcomm chips. The blacklisted interfaces are QMI/wwan. Reported-by: Lars Melin Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit c42834bb0f1d1aee47cf075b54261417e308eaab Author: Bjørn Mork Date: Fri Apr 25 18:49:19 2014 +0200 usb: option: add Alcatel L800MA commit dd6b48ecec2ea7d15f28d5e5474388681899a5e1 upstream. Device interface layout: 0: ff/ff/ff - serial 1: ff/00/00 - serial AT+PPP 2: ff/ff/ff - QMI/wwan 3: 08/06/50 - storage Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit bd500d6ef4e98863bae87bf60d598be04eef06b2 Author: Bjørn Mork Date: Fri Apr 25 18:49:18 2014 +0200 usb: option: add Olivetti Olicard 500 commit 533b3994610f316e5cd61b56d0c4daa15c830f89 upstream. Device interface layout: 0: ff/ff/ff - serial 1: ff/ff/ff - serial AT+PPP 2: 08/06/50 - storage 3: ff/ff/ff - serial 4: ff/ff/ff - QMI/wwan Reported-by: Julio Araujo Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit 08a5ad7941b7ee50d2cca2e3bd3374ddbb48f01f Author: Bjørn Mork Date: Fri Apr 25 18:49:17 2014 +0200 usb: qcserial: add Sierra Wireless MC7305/MC7355 commit bce4f588f19d59fc07fadfeb0b2a3a06c942827a upstream. Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit 51b9a752fa1ca450ce7d021fbb20d5f2e5b08426 Author: Bjørn Mork Date: Fri Apr 25 18:49:16 2014 +0200 usb: qcserial: add Sierra Wireless MC73xx commit 70a3615fc07c2330ed7c1e922f3c44f4a67c0762 upstream. Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit 78398169431a3f508d40cb74417434adad20b342 Author: Bjørn Mork Date: Fri Apr 25 18:49:15 2014 +0200 usb: qcserial: add Sierra Wireless EM7355 commit a00986f81182a69dee4d2c48e8c19805bdf0f790 upstream. Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit 4bb336c3ebbf8891a7f1a113eefa0abbd55f3ddd Author: Johan Hovold Date: Fri Apr 25 15:23:03 2014 +0200 USB: io_ti: fix firmware download on big-endian machines commit 5509076d1b4485ce9fb07705fcbcd2695907ab5b upstream. During firmware download the device expects memory addresses in big-endian byte order. As the wIndex parameter which hold the address is sent in little-endian byte order regardless of host byte order, we need to use swab16 rather than cpu_to_be16. Also make sure to handle the struct ti_i2c_desc size parameter which is returned in little-endian byte order. Reported-by: Ludovic Drolez Tested-by: Ludovic Drolez Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 0cc0d1a3f00220c9f2542115b6dc2f2fc4b91616 Author: Johan Hovold Date: Wed Apr 23 11:32:19 2014 +0200 USB: serial: fix sysfs-attribute removal deadlock commit 10164c2ad6d2c16809f6c09e278f946e47801b3a upstream. Fix driver new_id sysfs-attribute removal deadlock by making sure to not hold any locks that the attribute operations grab when removing the attribute. Specifically, usb_serial_deregister holds the table mutex when deregistering the driver, which includes removing the new_id attribute. This can lead to a deadlock as writing to new_id increments the attribute's active count before trying to grab the same mutex in usb_serial_probe. The deadlock can easily be triggered by inserting a sleep in usb_serial_deregister and writing the id of an unbound device to new_id during module unload. As the table mutex (in this case) is used to prevent subdriver unload during probe, it should be sufficient to only hold the lock while manipulating the usb-serial driver list during deregister. A racing probe will then either fail to find a matching subdriver or fail to get the corresponding module reference. Since v3.15-rc1 this also triggers the following lockdep warning: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc2 #123 Tainted: G W ------------------------------------------------------- modprobe/190 is trying to acquire lock: (s_active#4){++++.+}, at: [] kernfs_remove_by_name_ns+0x4c/0x94 but task is already holding lock: (table_lock){+.+.+.}, at: [] usb_serial_deregister+0x3c/0x78 [usbserial] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (table_lock){+.+.+.}: [] __lock_acquire+0x1694/0x1ce4 [] lock_acquire+0xb4/0x154 [] _raw_spin_lock+0x4c/0x5c [] usb_store_new_id+0x14c/0x1ac [] new_id_store+0x68/0x70 [usbserial] [] drv_attr_store+0x30/0x3c [] sysfs_kf_write+0x5c/0x60 [] kernfs_fop_write+0xd4/0x194 [] vfs_write+0xbc/0x198 [] SyS_write+0x4c/0xa0 [] ret_fast_syscall+0x0/0x48 -> #0 (s_active#4){++++.+}: [] print_circular_bug+0x68/0x2f8 [] __lock_acquire+0x1928/0x1ce4 [] lock_acquire+0xb4/0x154 [] __kernfs_remove+0x254/0x310 [] kernfs_remove_by_name_ns+0x4c/0x94 [] remove_files.isra.1+0x48/0x84 [] sysfs_remove_group+0x58/0xac [] sysfs_remove_groups+0x34/0x44 [] driver_remove_groups+0x1c/0x20 [] bus_remove_driver+0x3c/0xe4 [] driver_unregister+0x38/0x58 [] usb_serial_bus_deregister+0x84/0x88 [usbserial] [] usb_serial_deregister+0x6c/0x78 [usbserial] [] usb_serial_deregister_drivers+0x2c/0x4c [usbserial] [] usb_serial_module_exit+0x14/0x1c [sierra] [] SyS_delete_module+0x184/0x210 [] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(table_lock); lock(s_active#4); lock(table_lock); lock(s_active#4); *** DEADLOCK *** 1 lock held by modprobe/190: #0: (table_lock){+.+.+.}, at: [] usb_serial_deregister+0x3c/0x78 [usbserial] stack backtrace: CPU: 0 PID: 190 Comm: modprobe Tainted: G W 3.15.0-rc2 #123 [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (dump_stack+0x24/0x28) [] (dump_stack) from [] (print_circular_bug+0x2ec/0x2f8) [] (print_circular_bug) from [] (__lock_acquire+0x1928/0x1ce4) [] (__lock_acquire) from [] (lock_acquire+0xb4/0x154) [] (lock_acquire) from [] (__kernfs_remove+0x254/0x310) [] (__kernfs_remove) from [] (kernfs_remove_by_name_ns+0x4c/0x94) [] (kernfs_remove_by_name_ns) from [] (remove_files.isra.1+0x48/0x84) [] (remove_files.isra.1) from [] (sysfs_remove_group+0x58/0xac) [] (sysfs_remove_group) from [] (sysfs_remove_groups+0x34/0x44) [] (sysfs_remove_groups) from [] (driver_remove_groups+0x1c/0x20) [] (driver_remove_groups) from [] (bus_remove_driver+0x3c/0xe4) [] (bus_remove_driver) from [] (driver_unregister+0x38/0x58) [] (driver_unregister) from [] (usb_serial_bus_deregister+0x84/0x88 [usbserial]) [] (usb_serial_bus_deregister [usbserial]) from [] (usb_serial_deregister+0x6c/0x78 [usbserial]) [] (usb_serial_deregister [usbserial]) from [] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial]) [] (usb_serial_deregister_drivers [usbserial]) from [] (usb_serial_module_exit+0x14/0x1c [sierra]) [] (usb_serial_module_exit [sierra]) from [] (SyS_delete_module+0x184/0x210) [] (SyS_delete_module) from [] (ret_fast_syscall+0x0/0x48) Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit f11cdddc02f3a8d4c8be08479bb8a3622df0a0e1 Author: Johan Hovold Date: Fri Mar 28 18:05:10 2014 +0100 Revert "USB: serial: add usbid for dell wwan card to sierra.c" commit 2e01280d2801c72878cf3a7119eac30077b463d5 upstream. This reverts commit 1ebca9dad5abe8b2ed4dbd186cd657fb47c1f321. This device was erroneously added to the sierra driver even though it's not a Sierra device and was already handled by the option driver. Cc: Richard Farina Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 4721ad1c2fcc3f30da7dc0698265db178c8aa47b Author: Daniele Palmas Date: Wed Apr 2 11:19:48 2014 +0200 usb: option driver, add support for Telit UE910v2 commit d6de486bc22255779bd54b0fceb4c240962bf146 upstream. option driver, added VID/PID for Telit UE910v2 modem Signed-off-by: Daniele Palmas Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 965ebaefef9f0a3e44697950cc8f41c71cd9a462 Author: Michele Baldessari Date: Mon Mar 31 10:51:00 2014 +0200 USB: serial: ftdi_sio: add id for Brainboxes serial cards commit efe26e16b1d93ac0085e69178cc18811629e8fc5 upstream. Custom VID/PIDs for Brainboxes cards as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1071914 Signed-off-by: Michele Baldessari Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit 5c5822802722a45ec5cfbf36927b99728af673e2 Author: Johan Hovold Date: Thu Apr 3 13:06:46 2014 +0200 USB: usb_wwan: fix handling of missing bulk endpoints commit bd73bd8831696f189a479a0712ae95208e513d7e upstream. Fix regression introduced by commit 8e493ca1767d ("USB: usb_wwan: fix bulk-urb allocation") by making sure to require both bulk-in and out endpoints during port probe. The original option driver (which usb_wwan is based on) was written under the assumption that either endpoint could be missing, but evidently this cannot have been tested properly. Specifically, it would handle opening a device without bulk-in (but would blow up during resume which was implemented later), but not a missing bulk-out in write() (although it is handled in some places such as write_room()). Fortunately (?), the driver also got the test for missing endpoints wrong so the urbs were in fact always allocated, although they would be initialised using the wrong endpoint address (0) and any submission of such an urb would fail. The commit mentioned above fixed the test for missing endpoints but thereby exposed the other bugs which would now generate null-pointer exceptions rather than failed urb submissions. The regression was introduced in v3.7, but the offending commit was also marked for stable. Reported-by: Rafał Miłecki Signed-off-by: Johan Hovold Tested-by: Rafał Miłecki Signed-off-by: Greg Kroah-Hartman commit abb8fea9168d88ce14875070bca399fcf8274855 Author: Tristan Bruns Date: Sun Apr 13 23:57:16 2014 +0200 USB: cp210x: Add 8281 (Nanotec Plug & Drive) commit 72b3007951010ce1bbf950e23b19d9839fa905a5 upstream. Signed-off-by: Tristan Bruns Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit b843f630425c94974c45d9cf5e285f080ad39351 Author: Michael Ulbricht Date: Tue Mar 25 10:34:18 2014 +0100 USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver commit 895d240d1db0b2736d779200788e4c4aea28a0c6 upstream. By specifying NO_UNION_NORMAL the ACM driver does only use the first two USB interfaces (modem data & control). The AT Port, Diagnostic and NMEA interfaces are left to the USB serial driver. Signed-off-by: Michael Ulbricht Signed-off-by: Alexander Stein Signed-off-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman commit 402e194dfc5b38d99f9c65b86e2666b29adebf8c Author: Mel Gorman Date: Fri Apr 18 15:07:21 2014 -0700 mm: use paravirt friendly ops for NUMA hinting ptes commit 29c7787075c92ca8af353acd5301481e6f37082f upstream. David Vrabel identified a regression when using automatic NUMA balancing under Xen whereby page table entries were getting corrupted due to the use of native PTE operations. Quoting him Xen PV guest page tables require that their entries use machine addresses if the preset bit (_PAGE_PRESENT) is set, and (for successful migration) non-present PTEs must use pseudo-physical addresses. This is because on migration MFNs in present PTEs are translated to PFNs (canonicalised) so they may be translated back to the new MFN in the destination domain (uncanonicalised). pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set and clear the _PAGE_PRESENT bit using pte_set_flags(), pte_clear_flags(), etc. In a Xen PV guest, these functions must translate MFNs to PFNs when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting _PAGE_PRESENT. His suggested fix converted p[te|md]_[set|clear]_flags to using paravirt-friendly ops but this is overkill. He suggested an alternative of using p[te|md]_modify in the NUMA page table operations but this is does more work than necessary and would require looking up a VMA for protections. This patch modifies the NUMA page table operations to use paravirt friendly operations to set/clear the flags of interest. Unfortunately this will take a performance hit when updating the PTEs on CONFIG_PARAVIRT but I do not see a way around it that does not break Xen. Signed-off-by: Mel Gorman Acked-by: David Vrabel Tested-by: David Vrabel Cc: Ingo Molnar Cc: Peter Anvin Cc: Fengguang Wu Cc: Linus Torvalds Cc: Steven Noonan Cc: Rik van Riel Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Dave Hansen Cc: Srikar Dronamraju Cc: Cyrill Gorcunov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d568de4b73a2ffe3fe4ee35a87fdfa5807b097d0 Author: Mizuma, Masayoshi Date: Fri Apr 18 15:07:18 2014 -0700 mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages() commit 7848a4bf51b34f41fcc9bd77e837126d99ae84e3 upstream. soft lockup in freeing gigantic hugepage fixed in commit 55f67141a892 "mm: hugetlb: fix softlockup when a large number of hugepages are freed." can happen in return_unused_surplus_pages(), so let's fix it. Signed-off-by: Masayoshi Mizuma Signed-off-by: Naoya Horiguchi Cc: Joonsoo Kim Cc: Michal Hocko Cc: Aneesh Kumar Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit e412868ecbda32653fe3412b7e0a45f6491cd591 Author: Kirill A. Shutemov Date: Fri Apr 18 15:07:25 2014 -0700 thp: close race between split and zap huge pages commit b5a8cad376eebbd8598642697e92a27983aee802 upstream. Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have the same root cause. Let's look to them one by one. The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). From my testing I see that page_mapcount() is higher than mapcount here. I think it happens due to race between zap_huge_pmd() and page_check_address_pmd(). page_check_address_pmd() misses PMD which is under zap: CPU0 CPU1 zap_huge_pmd() pmdp_get_and_clear() __split_huge_page() anon_vma_interval_tree_foreach() __split_huge_page_splitting() page_check_address_pmd() mm_find_pmd() /* * We check if PMD present without taking ptl: no * serialization against zap_huge_pmd(). We miss this PMD, * it's not accounted to 'mapcount' in __split_huge_page(). */ pmd_present(pmd) == 0 BUG_ON(mapcount != page_mapcount(page)) // CRASH!!! page_remove_rmap(page) atomic_add_negative(-1, &page->_mapcount) The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!". It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd(). This happens in similar way: CPU0 CPU1 zap_huge_pmd() pmdp_get_and_clear() page_remove_rmap(page) atomic_add_negative(-1, &page->_mapcount) __split_huge_page() anon_vma_interval_tree_foreach() __split_huge_page_splitting() page_check_address_pmd() mm_find_pmd() pmd_present(pmd) == 0 /* The same comment as above */ /* * No crash this time since we already decremented page->_mapcount in * zap_huge_pmd(). */ BUG_ON(mapcount != page_mapcount(page)) /* * We split the compound page here into small pages without * serialization against zap_huge_pmd() */ __split_huge_page_refcount() VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!! So my understanding the problem is pmd_present() check in mm_find_pmd() without taking page table lock. The bug was introduced by me commit with commit 117b0791ac42. Sorry for that. :( Let's open code mm_find_pmd() in page_check_address_pmd() and do the check under page table lock. Note that __page_check_address() does the same for PTE entires if sync != 0. I've stress tested split and zap code paths for 36+ hours by now and don't see crashes with the patch applied. Before it took <20 min to trigger the first bug and few hours for second one (if we ignore first). [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com> [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com> Signed-off-by: Kirill A. Shutemov Reported-by: Sasha Levin Tested-by: Sasha Levin Cc: Bob Liu Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman Cc: Michel Lespinasse Cc: Dave Jones Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 184de1005a3eea00bb4d1d2df3e545dadcf5e9aa Author: Peter Foley Date: Fri Apr 18 15:07:11 2014 -0700 init/Kconfig: move the trusted keyring config option to general setup commit 82c04ff89eba09d0e46e3f3649c6d3aa18e764a0 upstream. The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it to show up in the toplevel of the kernel configuration. Fix this by moving it under the General Setup menu. Signed-off-by: Peter Foley Cc: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 816c942dfb637bb5593cf58f10b7eab6895e7a66 Author: Steven Rostedt (Red Hat) Date: Fri May 2 13:30:04 2014 -0400 tracing: Use rcu_dereference_sched() for trace event triggers commit 561a4fe851ccab9dd0d14989ab566f9392d9f8b5 upstream. As trace event triggers are now part of the mainline kernel, I added my trace event trigger tests to my test suite I run on all my kernels. Now these tests get run under different config options, and one of those options is CONFIG_PROVE_RCU, which checks under lockdep that the rcu locking primitives are being used correctly. This triggered the following splat: =============================== [ INFO: suspicious RCU usage. ] 3.15.0-rc2-test+ #11 Not tainted ------------------------------- kernel/trace/trace_events_trigger.c:80 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 4 locks held by swapper/1/0: #0: ((&(&j_cdbs->work)->timer)){..-...}, at: [] call_timer_fn+0x5/0x1be #1: (&(&pool->lock)->rlock){-.-...}, at: [] __queue_work+0x140/0x283 #2: (&p->pi_lock){-.-.-.}, at: [] try_to_wake_up+0x2e/0x1e8 #3: (&rq->lock){-.-.-.}, at: [] try_to_wake_up+0x1a0/0x1e8 stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc2-test+ #11 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 0000000000000001 ffff88007e083b98 ffffffff819f53a5 0000000000000006 ffff88007b0942c0 ffff88007e083bc8 ffffffff81081307 ffff88007ad96d20 0000000000000000 ffff88007af2d840 ffff88007b2e701c ffff88007e083c18 Call Trace: [] dump_stack+0x4f/0x7c [] lockdep_rcu_suspicious+0x107/0x110 [] event_triggers_call+0x99/0x108 [] ftrace_event_buffer_commit+0x42/0xa4 [] ftrace_raw_event_sched_wakeup_template+0x71/0x7c [] ttwu_do_wakeup+0x7f/0xff [] ttwu_do_activate.constprop.126+0x5c/0x61 [] try_to_wake_up+0x1ac/0x1e8 [] wake_up_process+0x36/0x3b [] wake_up_worker+0x24/0x26 [] insert_work+0x5c/0x65 [] __queue_work+0x26c/0x283 [] ? __queue_work+0x283/0x283 [] delayed_work_timer_fn+0x1e/0x20 [] call_timer_fn+0xdf/0x1be^M [] ? call_timer_fn+0x5/0x1be [] ? __queue_work+0x283/0x283 [] run_timer_softirq+0x1a4/0x22f^M [] __do_softirq+0x17b/0x31b^M [] irq_exit+0x42/0x97 [] smp_apic_timer_interrupt+0x37/0x44 [] apic_timer_interrupt+0x6f/0x80 [] ? default_idle+0x21/0x32 [] ? default_idle+0x1f/0x32 [] arch_cpu_idle+0xf/0x11 [] cpu_startup_entry+0x1a3/0x213 [] start_secondary+0x212/0x219 The cause is that the triggers are protected by rcu_read_lock_sched() but the data is dereferenced with rcu_dereference() which expects it to be protected with rcu_read_lock(). The proper reference should be rcu_dereference_sched(). Cc: Tom Zanussi Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit ff3db3fa612bedd7fa2c6ed3f32f850929f2c7d7 Author: zhangwei(Jovi) Date: Thu Apr 17 16:05:19 2014 +0800 tracing/uprobes: Fix uprobe_cpu_buffer memory leak commit 6ea6215fe394e320468589d9bba464a48f6d823a upstream. Forgot to free uprobe_cpu_buffer percpu page in uprobe_buffer_disable(). Link: http://lkml.kernel.org/p/534F8B3F.1090407@huawei.com Acked-by: Namhyung Kim Signed-off-by: zhangwei(Jovi) Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 0aa769c91d405d6bee480513ae490fe1360e74ae Author: NeilBrown Date: Wed Apr 9 12:25:43 2014 +1000 md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails. commit da1aab3dca9aa88ae34ca392470b8943159e25fe upstream. When performing a user-request check/repair (MD_RECOVERY_REQUEST is set) on a raid1, we allocate multiple bios each with their own set of pages. If the page allocations for one bio fails, we currently do *not* free the pages allocated for the previous bios, nor do we free the bio itself. This patch frees all the already-allocate pages, and makes sure that all the bios are freed as well. This bug can cause a memory leak which can ultimately OOM a machine. It was introduced in 3.10-rc1. Fixes: a07876064a0b73ab5ef1ebcf14b1cf0231c07858 Cc: Kent Overstreet Reported-by: Russell King - ARM Linux Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit c3feeab731f8e098a8d4de42a0c6e65d79877721 Author: Hans de Goede Date: Fri May 2 19:48:13 2014 +0200 HID: add NO_INIT_REPORTS quirk for Synaptics Touch Pad V 103S commit 2f433083e854ec72c19dc9b0e1cebcc8e230fd75 upstream. This touchpad seriously dislikes init reports, not only timeing out, but also refusing to work after this. Reported-and-tested-by: Vincent Fortier Signed-off-by: Hans de Goede Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit a2c5ef23686ea5a07095073165c6ccae225ef3ed Author: Benjamin Tissoires Date: Mon Mar 31 13:27:10 2014 -0400 HID: core: do not scan constant input report commit e24d0d399b2fce71b627043e900ef28283850482 upstream. The Microsoft Surface Type/Touch Cover 2 is a fancy device which advertised itself as a multitouch device but with constant input reports. This way, hid_scan_report() gives the group MULTITOUCH to it, but hid-multitouch can not handle it due to the constant collection ignored by hid-input. To prevent such crap in the future, and while we do not fix this particular device, make the scan_report coherent with hid-input.c, and ignore constant input reports. Signed-off-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit 346f2b1eae0cf0d0d34458d26a9f32df46e8641a Author: Derya Date: Mon Mar 31 13:27:09 2014 -0400 Revert "HID: microsoft: Add ID's for Surface Type/Touch Cover 2" commit f3b0cbce01cd5c242b420d986b208d306bdc5083 upstream. This reverts commit 117309c51dca42121f70cacec801511b76acf75c. The MS Surface Pro 2 has an USB composite device with 3 interfaces - interface 0 - sensor hub - interface 1 - wacom digitizer - interface 2 - the keyboard cover, if one is attached This USB composite device changes it product id dependent on if and which keyboard cover is attached. Adding the covers to hid_have_special_driver prevents loading the right hid drivers for the other two interfaces, all 3 get loaded with hid-microsoft. We don't even need hid-microsoft for the keyboards. We have to revert this to load the right hid modules for each interface. Signed-off-by: Derya Signed-off-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit f45babbd21f7e603acdcd620afa77571fbb283d5 Author: Vladimir Murzin Date: Sun Apr 27 10:09:12 2014 +0100 xen/events/fifo: correctly align bitops commit 05a812ac474d0d6aef6d54b66bb08b81abde79c6 upstream. FIFO event channels require bitops on 32-bit aligned values (the event words). Linux's bitops require unsigned long alignment which may be 64-bits. On arm64 an incorrectly unaligned access will fault. Fix this by aligning the bitops along with an adjustment for bit position and using an unsigned long for the local copy of the ready word. Signed-off-by: Vladimir Murzin Tested-by: Pranavkumar Sawargaonkar Reviewed-by: Ian Campbell Signed-off-by: David Vrabel Signed-off-by: Greg Kroah-Hartman commit 540299d7c8e1946f3af29a632661e3b49cd12fd5 Author: Konrad Rzeszutek Wilk Date: Fri Apr 4 14:48:04 2014 -0400 xen/spinlock: Don't enable them unconditionally. commit e0fc17a936334c08b2729fff87168c03fdecf5b6 upstream. The git commit a945928ea2709bc0e8e8165d33aed855a0110279 ('xen: Do not enable spinlocks before jump_label_init() has executed') was added to deal with the jump machinery. Earlier the code that turned on the jump label was only called by Xen specific functions. But now that it had been moved to the initcall machinery it gets called on Xen, KVM, and baremetal - ouch!. And the detection machinery to only call it on Xen wasn't remembered in the heat of merge window excitement. This means that the slowpath is enabled on baremetal while it should not be. Reported-by: Waiman Long Acked-by: Steven Rostedt CC: Boris Ostrovsky Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: David Vrabel Signed-off-by: Greg Kroah-Hartman commit aedc82aa4086dd2a7b83bbe1e123bb340c8aa406 Author: Sachin Prabhu Date: Tue Mar 11 16:11:47 2014 +0000 cifs: Wait for writebacks to complete before attempting write. commit c11f1df5003d534fd067f0168bfad7befffb3b5c upstream. Problem reported in Red Hat bz 1040329 for strict writes where we cache only when we hold oplock and write direct to the server when we don't. When we receive an oplock break, we first change the oplock value for the inode in cifsInodeInfo->oplock to indicate that we no longer hold the oplock before we enqueue a task to flush changes to the backing device. Once we have completed flushing the changes, we return the oplock to the server. There are 2 ways here where we can have data corruption 1) While we flush changes to the backing device as part of the oplock break, we can have processes write to the file. These writes check for the oplock, find none and attempt to write directly to the server. These direct writes made while we are flushing from cache could be overwritten by data being flushed from the cache causing data corruption. 2) While a thread runs in cifs_strict_writev, the machine could receive and process an oplock break after the thread has checked the oplock and found that it allows us to cache and before we have made changes to the cache. In that case, we end up with a dirty page in cache when we shouldn't have any. This will be flushed later and will overwrite all subsequent writes to the part of the file represented by this page. Before making any writes to the server, we need to confirm that we are not in the process of flushing data to the server and if we are, we should wait until the process is complete before we attempt the write. We should also wait for existing writes to complete before we process an oplock break request which changes oplock values. We add a version specific downgrade_oplock() operation to allow for differences in the oplock values set for the different smb versions. Signed-off-by: Sachin Prabhu Reviewed-by: Jeff Layton Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 2419da02f5ff42cb8af85c0b6a3916b6187abf2c Author: Al Viro Date: Fri Mar 14 10:56:20 2014 -0400 don't bother with {get,put}_write_access() on non-regular files commit dd20908a8a06b22c171f6c3fcdbdbd65bed07505 upstream. it's pointless and actually leads to wrong behaviour in at least one moderately convoluted case (pipe(), close one end, try to get to another via /proc/*/fd and run into ETXTBUSY). Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit 7147ad6410de7f330650e47b198815fe0d907a65 Author: John David Anglin Date: Sun Apr 27 16:20:47 2014 -0400 parisc: remove _STK_LIM_MAX override commit e0d8898d76a785453bfaf6cd08b830a7d5189f78 upstream. There are only a couple of architectures that override _STK_LIM_MAX to a non-infinity value. This changes the stack allocation semantics in subtle ways. For example, GNU make changes its stack allocation to the hard maximum defined by _STK_LIM_MAX. As a results, threads executed by processes running under make are allocated a stack size of _STK_LIM_MAX rather than a sensible default value. This causes various thread stress tests to fail when they can't muster more than about 50 threads. The attached change implements the default behavior used by the majority of architectures. Signed-off-by: John David Anglin Reviewed-by: Carlos O'Donell Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 3e7b54d984d455fe779c481972384082af1146c6 Author: Helge Deller Date: Sun Apr 13 00:03:55 2014 +0200 parisc: fix epoll_pwait syscall on compat kernel commit ab3e55b119c9653b19ea4edffb86f04db867ac98 upstream. This bug was detected with the libio-epoll-perl debian package where the test case IO-Ppoll-compat.t failed. Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit c400a1db3399aad8141d88df4545e3f50b38c416 Author: Helge Deller Date: Wed Apr 9 19:49:28 2014 +0200 parisc: change value of SHMLBA from 0x00400000 to PAGE_SIZE commit 0ef36bd2b37815719e31a72d2beecc28ca8ecd26 upstream. On parisc, SHMLBA was defined to 0x00400000 (4MB) to reflect that we need to take care of our caches for shared mappings. But actually, we can map a file at any multiple address of PAGE_SIZE, so let us correct that now with a value of PAGE_SIZE for SHMLBA. Instead we now take care of this cache colouring via the constant SHM_COLOUR while we map shared pages. Signed-off-by: Helge Deller CC: Jeroen Roovers CC: John David Anglin CC: Carlos O'Donell Signed-off-by: Greg Kroah-Hartman commit 669d557d58abebcc1de06cf7f2686612df6adc5c Author: Viresh Kumar Date: Tue Apr 15 10:54:41 2014 +0530 tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz() commit 27630532ef5ead28b98cfe28d8f95222ef91c2b7 upstream. Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz enabled) the tick_nohz_switch_to_nohz() function returns because it checks for the tick_nohz_active flag. This can't be set, because the function itself sets it. Undo the change in tick_nohz_switch_to_nohz(). Signed-off-by: Viresh Kumar Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.1397537987.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit f13588b86969b02e3d74b55ac04956461f898a3d Author: Viresh Kumar Date: Tue Apr 15 10:54:40 2014 +0530 tick-sched: Don't call update_wall_time() when delta is lesser than tick_period commit 03e6bdc5c4d0fc166bfd5d3cf749a5a0c1b5b1bd upstream. In tick_do_update_jiffies64() we are processing ticks only if delta is greater than tick_period. This is what we are supposed to do here and it broke a bit with this patch: commit 47a1b796 (tick/timekeeping: Call update_wall_time outside the jiffies lock) With above patch, we might end up calling update_wall_time() even if delta is found to be smaller that tick_period. Fix this by returning when the delta is less than tick period. [ tglx: Made it a 3 liner and massaged changelog ] Signed-off-by: Viresh Kumar Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Cc: John Stultz Link: http://lkml.kernel.org/r/80afb18a494b0bd9710975bcc4de134ae323c74f.1397537987.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit c70d62a42c2da17e9d6d92670bf1c61b09fc0f5c Author: Viresh Kumar Date: Tue Apr 15 10:54:37 2014 +0530 tick-common: Fix wrong check in tick_check_replacement() commit 521c42990e9d561ed5ed9f501f07639d0512b3c9 upstream. tick_check_replacement() returns if a replacement of clock_event_device is possible or not. It does this as the first check: if (tick_check_percpu(curdev, newdev, smp_processor_id())) return false; Thats wrong. tick_check_percpu() returns true when the device is useable. Check for false instead. [ tglx: Massaged changelog ] Signed-off-by: Viresh Kumar Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Link: http://lkml.kernel.org/r/486a02efe0246635aaba786e24b42d316438bf3b.1397537987.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit e4b2de2daa7c854b661d59196ce4ff996b948ecd Author: Jani Nikula Date: Mon Apr 28 13:10:07 2014 +0300 drm/i915: restore QUIRK_NO_PCH_PWM_ENABLE This reverts the bisected regressing commit bc0bb9fd1c7810407ab810d204bbaecb255fddde Author: Jani Nikula Date: Thu Nov 14 12:14:29 2013 +0200 drm/i915: remove QUIRK_NO_PCH_PWM_ENABLE restoring QUIRK_NO_PCH_PWM_ENABLE for a couple of Dell XPS models which broke in 3.14. There is no such revert upstream. We have root caused and fixed the issue upstream, without the quirk, with: commit 39fbc9c8f6765959b55e0b127dd5c57df5a47d67 Author: Jani Nikula Date: Wed Apr 9 11:22:06 2014 +0300 drm/i915: check VBT for supported backlight type and commit c675949ec58ca50d5a3ae3c757892f1560f6e896 Author: Jani Nikula Date: Wed Apr 9 11:31:37 2014 +0300 drm/i915: do not setup backlight if not available according to VBT While the commits are within the stable rules otherwise, and fix more machines than just the regressed Dell XPS models, we feel backporting them to stable may be too risky. The revert is limited to the broken machines, and the impact should be effectively the same as what the upstream commits do more generally. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76276 Reported-by: Romain Francoise Acked-by: Daniel Vetter Tested-by: Kamal Mostafa Signed-off-by: Jani Nikula Tested-by: Romain Francoise Signed-off-by: Greg Kroah-Hartman commit 524cf7932e479d835e5677b588b61836821e7dfe Author: Ilya Dryomov Date: Tue Mar 4 11:57:17 2014 +0200 rbd: fix error paths in rbd_img_request_fill() commit 42dd037c08c7cd6e3e9af7824b0c1d063f838885 upstream. Doing rbd_obj_request_put() in rbd_img_request_fill() error paths is not only insufficient, but also triggers an rbd_assert() in rbd_obj_request_destroy(): Assertion failure in rbd_obj_request_destroy() at line 1867: rbd_assert(obj_request->img_request == NULL); rbd_img_obj_request_add() adds obj_requests to the img_request, the opposite is rbd_img_obj_request_del(). Use it. Fixes: http://tracker.ceph.com/issues/7327 Signed-off-by: Ilya Dryomov Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman commit 6cb4463aeef0df3f8c493783d4fab4687d27f715 Author: Steven Rostedt (Red Hat) Date: Wed Feb 26 10:54:36 2014 -0500 tracepoint: Do not waste memory on mods with no tracepoints commit 7dec935a3aa04412cba2cebe1524ae0d34a30c24 upstream. No reason to allocate tp_module structures for modules that have no tracepoints. This just wastes memory. Fixes: b75ef8b44b1c "Tracepoint: Dissociate from module mutex" Acked-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 4c03d4699182312ed42257834b915492af16022a Author: Peter Zijlstra Date: Wed Apr 9 16:24:47 2014 +0200 x86,preempt: Fix preemption for i386 Many people reported preemption/reschedule problems with i386 kernels for .13 and .14. After Michele bisected this to a combination of 3e8e42c69bb ("sched: Revert need_resched() to look at TIF_NEED_RESCHED") ded79754754 ("irq: Force hardirq exit's softirq processing on its own stack") it finally dawned on me that i386's current_thread_info() was to blame. When we are on interrupt/exception stacks, we fail to observe the right TIF_NEED_RESCHED bit and therefore the PREEMPT_NEED_RESCHED folding malfunctions. Current upstream fixes this by making i386 behave the same as x86_64 already did: 2432e1364bbe ("x86: Nuke the supervisor_stack field in i386 thread_info") b807902a88c4 ("x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386") 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure") 198d208df437 ("x86: Keep thread_info on thread stack in x86_32") However, that is far too much to stuff into -stable. Therefore I propose we merge the below patch which uses task_thread_info(current) for tif_need_resched() instead of the ESP based current_thread_info(). This makes sure we always observe the one true TIF_NEED_RESCHED bit and things will work as expected again. Cc: bp@alien8.de Cc: fweisbec@gmail.com Cc: david.a.cohen@linux.intel.com Cc: mingo@kernel.org Cc: fweisbec@gmail.com Cc: greg@kroah.com Cc: Steven Rostedt Cc: gregkh@linuxfoundation.org Cc: pbonzini@redhat.com Cc: rostedt@goodmis.org Cc: stefan.bader@canonical.com Cc: mingo@kernel.org Cc: toralf.foerster@gmx.de Cc: David Cohen Cc: Steven Rostedt Cc: torvalds@linux-foundation.org Cc: Paolo Bonzini Cc: David Cohen Cc: Cc: Borislav Petkov Cc: Paolo Bonzini Cc: Cc: Borislav Petkov Cc: peterz@infradead.org Cc: Linus Torvalds Cc: barra_cuda@katamail.com Tested-by: Stefan Bader Tested-by: Toralf F¿rster Tested-by: Michele Ballabio Signed-off-by: Greg Kroah-Hartman Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/20140409142447.GD13658@twins.programming.kicks-ass.net commit 4d43406bd06f92ab86c31027a2c313b36dc4ba39 Author: Pablo Neira Ayuso Date: Mon Mar 24 15:10:37 2014 +0100 netfilter: nf_tables: set names cannot be larger than 15 bytes commit a9bdd8365684810e3de804f8c51e52c26a5eccbb upstream. Currently, nf_tables trims off the set name if it exceeeds 15 bytes, so explicitly reject set names that are too large. Reported-by: Giuseppe Longo Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 8b5740915a9faa8b1fa9166193a33e2a9ae30ec6 Author: Thomas Graf Date: Fri Apr 4 17:57:45 2014 +0200 netfilter: Can't fail and free after table replacement commit c58dd2dd443c26d856a168db108a0cd11c285bf3 upstream. All xtables variants suffer from the defect that the copy_to_user() to copy the counters to user memory may fail after the table has already been exchanged and thus exposed. Return an error at this point will result in freeing the already exposed table. Any subsequent packet processing will result in a kernel panic. We can't copy the counters before exposing the new tables as we want provide the counter state after the old table has been unhooked. Therefore convert this into a silent error. Cc: Florian Westphal Signed-off-by: Thomas Graf Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 0a8eda9c00ef37e8b40de77f2b0714317191bcf2 Author: Roman Pen Date: Tue Mar 4 23:13:10 2014 +0900 blktrace: fix accounting of partially completed requests commit af5040da01ef980670b3741b3e10733ee3e33566 upstream. trace_block_rq_complete does not take into account that request can be partially completed, so we can get the following incorrect output of blkparser: C R 232 + 240 [0] C R 240 + 232 [0] C R 248 + 224 [0] C R 256 + 216 [0] but should be: C R 232 + 8 [0] C R 240 + 8 [0] C R 248 + 8 [0] C R 256 + 8 [0] Also, the whole output summary statistics of completed requests and final throughput will be incorrect. This patch takes into account real completion size of the request and fixes wrong completion accounting. Signed-off-by: Roman Pen CC: Steven Rostedt CC: Frederic Weisbecker CC: Ingo Molnar CC: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 7a6f558b35e2b196eca1d40b48d37e8bcdc73d19 Author: Andrey Vagin Date: Fri Mar 28 13:54:32 2014 +0400 netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len commit 223b02d923ecd7c84cf9780bb3686f455d279279 upstream. "len" contains sizeof(nf_ct_ext) and size of extensions. In a worst case it can contain all extensions. Bellow you can find sizes for all types of extensions. Their sum is definitely bigger than 256. nf_ct_ext_types[0]->len = 24 nf_ct_ext_types[1]->len = 32 nf_ct_ext_types[2]->len = 24 nf_ct_ext_types[3]->len = 32 nf_ct_ext_types[4]->len = 152 nf_ct_ext_types[5]->len = 2 nf_ct_ext_types[6]->len = 16 nf_ct_ext_types[7]->len = 8 I have seen "len" up to 280 and my host has crashes w/o this patch. The right way to fix this problem is reducing the size of the ecache extension (4) and Florian is going to do this, but these changes will be quite large to be appropriate for a stable tree. Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable) Cc: Pablo Neira Ayuso Cc: Patrick McHardy Cc: Jozsef Kadlecsik Cc: "David S. Miller" Signed-off-by: Andrey Vagin Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit d6421db1dbd6a0c6fd6626c6c59d29204db76434 Author: Patrick McHardy Date: Sat Apr 12 13:17:57 2014 +0200 netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4 commit b855d416dc17061ebb271ea7ef1201d100531770 upstream. nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: Patrick McHardy Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit f661428d2efe8e3dc0912b79305e36bde54a8068 Author: Richard Guy Briggs Date: Tue Dec 10 22:10:41 2013 -0500 audit: convert PPIDs to the inital PID namespace. commit c92cdeb45eea38515e82187f48c2e4f435fb4e25 upstream. sys_getppid() returns the parent pid of the current process in its own pid namespace. Since audit filters are based in the init pid namespace, a process could avoid a filter or trigger an unintended one by being in an alternate pid namespace or log meaningless information. Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the init_pid_ns. (informed by ebiederman's 6c621b7e) Cc: Eric W. Biederman Signed-off-by: Richard Guy Briggs Signed-off-by: Greg Kroah-Hartman commit 6960029959f333bed9b0690c58429256fddd0a17 Author: Richard Guy Briggs Date: Thu Aug 15 18:05:12 2013 -0400 pid: get pid_t ppid of task in init_pid_ns commit ad36d28293936b03d6b7996e9d6aadfd73c0eb08 upstream. Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup of the PPID (real_parent's pid_t) of a process, including rcu locking, in the arbitrary and init_pid_ns. This provides an alternative to sys_getppid(), which is relative to the child process' pid namespace. (informed by ebiederman's 6c621b7e) Cc: Eric W. Biederman Signed-off-by: Richard Guy Briggs Signed-off-by: Greg Kroah-Hartman commit be57abca400415c343a251053db2ba8e8cf873ad Author: Steven Rostedt Date: Tue Apr 22 19:23:30 2014 -0400 tools lib traceevent: Fix memory leak in pretty_print() commit de04f8657de9d3351a2d5880f1f7080b23b798cf upstream. Commit 12e55569a244 "tools lib traceevent: Use helper trace-seq in print functions like kernel does" added a extra trace_seq helper to process string arguments like the kernel does it. But the difference between the kernel and the userspace library is that the kernel's trace_seq structure has a static allocated buffer. The userspace one has a dynamically allocated one. It requires a trace_seq_destroy(), otherwise it produces a nasty memory leak. Signed-off-by: Steven Rostedt Link: http://lkml.kernel.org/r/20140422192330.6bb09bf8@gandalf.local.home Signed-off-by: Jiri Olsa Signed-off-by: Greg Kroah-Hartman commit 8c02c2a4f89a4eda43b4679e8f0e170edeebc85f Author: Marcelo Tosatti Date: Thu Apr 10 18:19:12 2014 -0300 KVM: x86: remove WARN_ON from get_kernel_ns() commit b351c39cc9e0151cee9b8d52a1e714928faabb38 upstream. Function and callers can be preempted. https://bugzilla.kernel.org/show_bug.cgi?id=73721 Signed-off-by: Marcelo Tosatti Reviewed-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit aa726923293e27cf9d3ee572d4a674c3818911e4 Author: Dan Carpenter Date: Wed Oct 30 20:13:51 2013 +0300 SCSI: megaraid: missing bounds check in mimd_to_kioc() commit 3de2260140417759c669d391613d583baf03b0cf upstream. pthru32->dataxferlen comes from the user so we need to check that it's not too large so we don't overflow the buffer. Reported-by: Nico Golde Reported-by: Fabian Yamaguchi Signed-off-by: Dan Carpenter Acked-by: Sumit Saxena Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman commit 50b9f5b92eeee5b1ec5495d0ba1645f900855a1b Author: James Bottomley Date: Tue Jan 21 07:01:41 2014 -0800 SCSI: dual scan thread bug fix commit f2495e228fce9f9cec84367547813cbb0d6db15a upstream. In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. Tested-by: Sarah Sharp Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman commit bf1802de75b7e3aab950efde9efb61637e4f449f Author: James Bottomley Date: Tue Jan 21 07:00:50 2014 -0800 scsi: fix our current target reap infrastructure commit e63ed0d7a98014fdfc2cfeb3f6dada313dcabb59 upstream. This patch eliminates the reap_ref and replaces it with a proper kref. On last put of this kref, the target is removed from visibility in sysfs. The final call to scsi_target_reap() for the device is done from __scsi_remove_device() and only if the device was made visible. This ensures that the target disappears as soon as the last device is gone rather than waiting until final release of the device (which is often too long). Reviewed-by: Alan Stern Tested-by: Sarah Sharp Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman