Two Weight Problems and
Bellman Functions on filtered
         probability spaces


                             by
                        Jingguo Lai
      B. S., Fudan University; Shanghai, China, 2008
  M. S., Michigan State University; East Lansing, MI, 2010


    A Dissertation submitted in partial fulfillment of the
    requirements for the Degree of Doctor of Philosophy
   in the Department of Mathematics at Brown University


                 Providence, Rhode Island
                          May 2015

c Copyright 2015 by Jingguo Lai
       This dissertation by Jingguo Lai is accepted in its present form
            by the Department of Mathematics as satisfying the
       dissertation requirement for the degree of Doctor of Philosophy.


Date
                                            Sergei Treil, Advisor


                   Recommended to the Graduate Council


Date
                                             Jill Pipher, Reader


Date
                                            Brian Cole, Reader


                     Approved by the Graduate Council


Date
                                 Peter M. Weber, Dean of the Graduate School

                                      iii
                              Curriculum Vita

   Jingguo Lai was born in Shenyang, Liaoning, P.R. China on Feburary 11, 1985
to Changbin Lai and Lijie Xue. He completed the B.S. at Fudan University on July
2008 and the M.S. at Michigan State University on July 2010. After graduating, he
continued the study of mathematics at Brown University. He married Maggie Fong
in the summer of 2013. Jingguo completed this thesis under the supervision of Sergei
Treil.


                                         iv
Dedicated to my beloved parents:
  Changbin Lai and Lijie Xue


               v
                               Acknowledgements

   To my advisor, Prof. Sergei Treil for his invaluable mentoring over the past five
years. He suggested these problems, taught me the right way of thinking process, and
provided guidance and ideas for all the tough steps.


   To my readers, Prof. Jill Pipher and Prof. Brian Cole for their careful reading,
gentle criticism, and insightful edits.


   To Prof. Justin Holmer for his help and support along the way.


   To the wonderful mathematics department staff, particularly Audrey Aguiar,
Larry Larivee, and Doreen Pappas.


   To my parents Changbin Lai and Lijie Xue, and my wife Maggie Fong for their
love and support.


   To my cat Gigi for bringing me so much fun.


                                          vi
                                  Contents

Curriculum Vita                                                           iv


Dedication                                                                v


Acknowledgements                                                          vi


Chapter 1. Introduction                                                    1
  1. Two Weight Problems                                                   1
  2. Bellman Functions on filtered probability spaces                      5
  3. Outline of the thesis                                                 9


Chapter 2. Two weight estimates for a vector-valued positive operators    10
  1. The case 1 < p ≤ q                                                   10
  2. The case q < p < ∞: a counterexample                                 11
  3. The case q < p < ∞: necessity and sufficiency                        12


Chapter 3. Two weight estimates for paraproducts                          17
  1. Construction of the stopping intervals                               17
  2. The case 1 < p ≤ 2                                                   18
  3. The case 2 < p < ∞: a counterexample                                 19
  4. The case 2 < p < ∞: trilinear forms and necessity                    21
  5. The case 2 < p < ∞: from trilinear forms to shifted bilinear forms   22
  6. The case 2 < p < ∞: sufficiency                                      24
  6.1. A modified stopping interval construction                          25
  6.2. Estimation of T1                                                   26
  6.3. Estimation of T2                                                   30
                                        vii
Chapter 4. Bellman functions on filtered probability spaces I: Burkh¨older’s hull
               and Super-solutions                                                  33
  1. Properties of the Bellman function B(F, f, M ; C)                              33
  2. Properties of the Super-solutions                                              34
  2.1. The super-solutions and the dyadic Caleson Embedding Theorem                 34
  2.2. Further properties of B(F, f, M ; C)                                         35
  2.3. Regularization of the super-solutions                                        36
  2.4. The main inequality in its infinitesimal version                             37
  3. Finding a super-solution via the Burkh¨older’s hull                            38
  3.1. Burkh¨older’s hull and some reductions                                       38
  3.2. The formula of the Burkh¨older’s hull and an explicit super-solution         40

Chapter 5. The Bellman functions on filtered probability spaces II: Remodeling
               and proof of the main theorems                                       42
  1. Properties of the Bellman function BµF (F, f, M, C)                            42
  2. Remodeling of the Bellman function B(F, f, M ; C = 1) for an infinitely
      refining filtration                                                           43
  3. The Bellman function BµF (F, f, M ; C) of Theorem 1.13                         46
  3.1. BµF (F, f , M ) ≤ B(F, f , M )                                               46
  3.2. BµF (F, f , M ) = B(F, f , M ) for an infinitely refining filtration         47
  4. The Bellman function BeµF (F, f) of the maximal operators                      50
  4.1. BeµF (F, f ) ≤ BµF (F, f , 1)                                                50
  4.2. BeµF (F, f ) = BµF (F, f , 1) for an infinitely refining filtration          51

Bibliography                                                                        54


                                              viii
 Abstract of ”Two Weight Problems and Bellman Funcfions on filtered
                                 probability spaces”
                   by Jingguo Lai, Ph.D., Brown University, May 2015


   Chapter 1 provides the necessary background and states the main results of the
thesis. Two seperate topics are studied. The first topic is on two weight problems.
The second topic is on Bellman functions on filtered probability spaces.
   Chapter 2 proves a two weight estimation for a vector-valued positive operator.
We consider two different cases of this theorem. The easier case 1 < p ≤ q requires
only one testing condition. However, we construct a counterexample showing this
testing condtion alone is not sufficient for the case q < p < ∞. We apply the Rubio
de Francia Algorithm to reduce our problem to the well-known two weight estimates
for positive dyadic operators.
   Chapter 3 proves a two weight estimation for paraproducts. We again consider two
different cases seperately. The first few steps of the proof proceeds exactly the same
as in Chapter 2. However, for the harder case 2 < p < ∞, we need to characterize a
two weight inequality for shifted bilinear forms, which takes up the majority of this
chapter.
   Chapter 4 considers the celebrated Dyadic Carleson Embedding Theorem. We
streamline a way of finding a super-solution of the Bellman function via the Burkh¨older’s
hull. We give an explicit formula of the Burkh¨older’s hull and hence a super-solution
in this chapter.
   Chapter 5 generalizes the Dyadic Carleson Embedding Theorem to the filtered
probability spaces and proves the coincidence of the Bellman functions on an infinite
refining filtration. The proof requires a remodeling of the Dyadic Carleson Embedding
Theorem. Finally, we also consider the Bellman function of the Doob’s Martingale
Inequality.
                                       CHAPTER 1

                                      Introduction

   In this chapter we provide some useful background on the topics of this thesis.
First we establish the general setup of two weight problems and raise the questions
of interest. Then we introduce a well-known application of the Bellman function
techniques and pose the questions we want to solve. Further we provide a brief
outline of this thesis.

                                  1. Two Weight Problems

   The original question about two weight estimates is to find a necessary and suf-
ficient condition on the weights (non-negative locally integrable functions) w and v
such that an operator T : Lp (w) → Lp (v) is bounded for all 1 < p < ∞, i.e. the
inequality
                     Z                       Z
                              p         p
(1.1)                     |T f | vdx ≤ C ·       |f |p wdx, for f ∈ Lp (w).

Let u = w−p/p . A symmetric formulation of (1.1), well-known from 80s, is
                  Z                    Z
(1.2)               |T (uf )| vdx ≤ C · |f |p udx, for f ∈ Lp (u).
                             p       p


(1.2) looks more natural than (1.1) in the two weight setting: in particular, if T is an
integral operator, then the integration in the operator is performed with respect to
the same measure udx as in the domain.
    Denote µ = udx and ν = vdx. Let T µ (f ) := T (µf ). We can rewrite (1.2) into
                  Z                   Z
(1.3)                |T (f )| dν ≤ C · |f |p dµ, for f ∈ Lp (µ).
                       µ     p      p


   Two weight problems are notoriously hard. The first few results are for T being

        • Hardy operator by Muckenhoupt [1].
        • Maximal operators by Sawyer [2], where a testing condition is introduced.
                                                 1
       • Fractional integrals by Sawyer and Wheeden [3] and [4].

For all these special operators, a characterization for all 1 < p < ∞ is given. In
particlar, Fractional integrals are examples of positive operators which are relatively
easier and more or less solved. To highlight some results of discrete positive operators,
we have results for T being

       • Positive dyadic operators and p = 2 in [5].
       • Positive dyadic operators and 1 < p < ∞ in [6].
       • Vector-valued positive dyadic operators and 1 < p < ∞ in [9].

Several simplied proofs for the results listed above are also found. For example,

       • [7] and [8] simplifies the proof given in [6].
       • [10] simplifies the proof given in [9].

   In rescent years, there is a breakthrough on this problem for singular operators.
Initiated by Nazarov, Treil and Volberg, and followed by Lacey, Sawyer, Uriarte-Tuero
et al., we have the following results for T being

       • Haar multipliers in [5], which is the first case of discrete singular operators.
       • Well localized operators including Haar shifts in [11].
       • Sufficient conditions for Calder´on-Zygmund singular integral operators, and
         necessary and sufficient conditions for Calder´on-Zygmund singular integral
         operators together with two maximal operators in [12].
       • Hilbert Transform in [13] and [14].
       • Cauchy Transform in [15].
       • Riesz Transform in [16].

Two weight problems for general Calder´on-Zygmund singular integral operators re-
main unsolved. Note the original Two Weight Problems (1.1), (1.2), (1.3) make sence
for all 1 < p < ∞. However, rescent results in [5], [11]-[16] only consider for p = 2.
   The two weight problems we are interested in are both discrete ones. The first
is a new two weight estimates for Vector-valued positive operators when 1 < p < ∞.
                                            2
The second is a two weight estimates for Paraproducts when 1 < p < ∞. Our setup
follows from the one in [17], which is more general than the dyadic case.


   Definition 1.1. For a measurable space (X , T ), a lattice L ⊆ T is a collection
of measurable subsets of X with the following properties

        (i) L is a union of generations Ln , n ∈ Z, where each generation is a collection
              of disjoint measurable sets (call them intervals), covering X .
        (ii) For each n ∈ Z, the covering Ln+1 is a countable refinement of the covering
              Ln , i.e. each interval I ∈ Ln is a countable union of disjoint intervals
              J ∈ Ln+1 . We allow the situation where there is only one such interval J,
              i.e. J = I; this means that I ∈ Ln also belongs to the generation Ln+1 .


   Definition 1.2. For an interval I ∈ L, let rk(I) be the rank of the interval I,
i.e. the largest number n such that I ∈ Ln . For an interval I ∈ L, rk(I) = n, a child
of I is an interval J ∈ Ln+1 such that J ⊆ I (actually, J $ I). The colletion of all
children of I is denoted by child(I). Correspondingly, I is called the parent of J.


   Definition 1.3. For a positive measure µ on (X , T ) , define the averaging oper-
ator as
                                                                Z       
                             µ                              −1
(1.4)                      E f = hf i       1 =         µ(I)         f dµ 1 .
                             I        I,µ   I                                I
                                                                 I


where 1 is the indicator function of the interval I. The martingale difference operator
          I

is then defined to be
                                                          X
(1.5)                            ∆µ f = −Eµ f +                      Eµ f.
                                  I             I                     J
                                                        J∈child(I)


   From now on, we assume (X , T ) is a measurable space, L ⊆ T is a lattice on X ,
and µ, ν are two positive measures.
   We denote the conjugate H¨older exponent of p by p0 , where 1/p + 1/p0 = 1. Here,
and throughout the thesis, we use the notation A . B meaning that there exists an
absolute constant C, such that A ≤ CB, and we write A ≈ B if A . B . A.
                                                    3
    Definition 1.4. Let α = {α : I ∈ L} be non-negative constants associated to
                                           I

a lattice L on (X , T ). Define a vector-valued operator

(1.6)                                 Tαµ f = {α · Eµ f }              .
                                                     I       I   I∈L


    Theorem 1.5 (Two weight estimates for a vector-valued positive operator [18]).
Let 1 < p < ∞ and 1 ≤ q < ∞.
                   Z "X             #p
                                   q q          Z
                                µ            p
(1.7)                     α · E f     dν ≤ C ·   |f |p dµ,
                          
                                       I       I
                         X    I∈L                                          X

holds if and only if

      (i) for the case 1 < p ≤ q, we have
                      Z  X            p
                                        q
(1.8)                            αq · 1  dν ≤ C1p · µ(J), J ∈ L
                                       
                                      I
                        J        I
                             I∈L:I⊆J

        (ii) for the case q < p < ∞, we have both (1.8) and
                 Z  X                 ( p ) 0
                                ν(I)     q            q ( pq )
                                                               0
                             q
(1.9)                       α ·      ·1         dµ ≤ C2         · ν(J), J ∈ L.
                                       
                  J I∈L:I⊆J I µ(I)
                                      I


In particular, C ≈ C1 + C2 .

    Remark 1.6. Vector-valued positive operators in the thesis are viewed as a simple
model of the paraproducts defined below.

    Definition 1.7. For a measurable function b, the paraproduct operator with sym-
bol b is
                                               X                
(1.10)                              πbµ f =          Eµ f      ∆ν b .
                                                         I        I
                                               I∈L

    Paraproducts play an important role in the investigation of the weighted inequal-
ities for the singular integral operators. The L2 -boundedness of paraproducts is easy,
a necessary and sufficient condition (1.12) follows immediately from the Carleson
Embedding Theorem.
    This necessary and sufficient condition (1.12) can be stated as a testing condition,
i.e. a paraproduct is bounded in L2 if and only if there is a uniform estimate on all
                                                     4
intervals. In the classical non-weighted situation, the L2 boundedness is equivalent
to the boundedness of the paraproduct in all Lp , 1 < p < ∞.
   The weighted situation is much more interesting. It was shown in [17] that in
the one weight situation, the testing condition (1.12) is still necessary and sufficient,
but it now depends on p: the boundedness in Lp0 implies the boundedness in Lp with
1 < p ≤ p0 , but not in Lp with p0 < p < ∞.
   Two weight case becomes even more interesting: while it is not hard to show that
for p ≤ 2 the testing condition is still sufficient for the boundedness, we will present
a counterexample showing that for p > 2 the testing condition alone does not work.


   Theorem 1.8 (Two weight estimates for paraproducts [19]).


(1.11)                                      ||πbµ f ||p p      ≤ C p · ||f ||p p          ,
                                                      L (ν)                   L (µ)


holds if and only if

         (i) for the case 1 < p ≤ 2, we have

                                 Z " X       #p
                                            2 2
                                                 dν ≤ C1p · µ(J), J ∈ L,
                                        ν 
(1.12)                                  ∆ b
                                  J                  I
                                      I∈L,I⊆J


       (ii) for the case 2 < p < ∞, we have both (1.12) and

(1.13)
                                                        ( p2 )0
                               ν(I 0 ) ν                                             0
  Z                                            
      X          X                          ν 2                        2( p2 )
                                     E      ∆ b                 dµ ≤ C2              · ν(J 0 ), J ∈ L, J 0 ∈ child(J).
   J0
      
         I∈L   I 0 ∈child(I)
                               µ(I) I           I   
                   I 0 $J 0


In particular, C ≈ C1 + C2 .


                  2. Bellman Functions on filtered probability spaces

   Denote the Lebesgue measure of a set E by |E|, the average value of f on an
interval I by hf i . The celebrated dyadic Carleson Embedding Theorem states
                         I


                                                                    5
   Theorem 1.9 (Dyadic Carleson Embedding Theorem). Let D = {([0, 1) + j) · 2k :
j, k ∈ Z} be the standard dyadic lattice on R, and let {α }    be a sequence of non-
                                                         I I∈D
                                                          P
negative numbers satisfying the Carleson condition that:            α ≤ C|I| holds
                                                                                 J∈D,J⊆I   I

for all dyadic intervals I ∈ D. Then the embedding

              X
(1.14)              α |hf i |p ≤ Cp · C||f ||p p holds f or all f ∈ Lp , where p > 1.
                     I       I                  L
              I∈D


Moreover, the constant Cp = (p0 )p is sharp (cannot be replaced by a smaller one).


   An approach of proving Theorem 1.9 is the introduction of the Bellman function.
Without loss of generality, we can assume f ≥ 0. Following [20] and [21], we define
the Bellman function in three variables (F, f, M ) as

(1.15)
                         (                                                                                )
                                  X
B(F, f, M ; C) = sup |J|−1                 α hf ip : f, {α }              satisfy (i), (ii), (iii), and (iv) ,
                                            I       I           I   I∈D
                                 I∈D,I⊆J


                                                X                          X
(i) hf p i = F ; (ii) hf i = f; (iii) |J|−1             α = M ; (iv)             α ≤ C|J| for all J ∈ D.
         J               J                                  I                     I
                                                I⊆J                        I⊆J

   Note that the Bellman function B(F, f, M ; C) defined above dose not depend on
the choice of the interval J.
   In [20], (1.14) was first proved using the Bellman function method for the case
p = 2, and in [21], the sharpness for the case p = 2 was also claimed. Later, A. Melas
found in [22] the exact Bellman function for all p > 1 in a tree-like setting using
combinatorial and covering reasoning. In [23], an alternative way of finding the exact
Bellman function based on Monge-Amp`ere equation was also established.
   The Bellman functions have deep connetions to the Stochastic Optimal Control
theory [21]. Finding the exact Bellman functions is a difficult task. Both the com-
binatorial methods in [22] and the methods of solving the Bellman PDE in [23] are
quite complicated. Luckily, the proof of Theorem 1.9 only needs a super-solution
instead of the exact Bellman function, see [20], [21]. In this thesis, we will present a
way of calculating a super-solution via the Burkh¨older’s hull.
                                                        6
   On the other hand, computation of the exact Bellman functions usually reflects
deeper structure of the corresponding harmonic analysis problem. It is interesting to
note that the exact Bellman function of Theorem 1.9 is not restricted to the standard
dyadic lattice. In [22], it also works for the tree-like structure. Let us consider a
more general situation here.
   Let (X , F, {Fn }n≥0 , µ) be a discrete-time filtered probability space. By a discrete-
time filtration, we mean a sequence of non-decreasing σ-fields

                        {∅, X } = F0 ⊆ F1 ⊆ ... ⊆ Fn ⊆ ... ⊆ F.

                                                             = µ(E)−1
                                                                        R
We introduce notations fn = Eµ [f |Fn ] and hf i                        E
                                                                            f dµ.
                                                       E,µ


   Definition 1.10. A sequence of non-negative random variables {αn }n≥0 is called
a Carleson sequence, if each αn is Fn -measurable and
                           "           #
                             X
(1.16)                  Eµ      αk |Fn ≤ C for every n ≥ 0.
                               k≥n

   Definition 1.11. {Fn }n≥0 is called an infinitely refining filtration, if for every
ε > 0, every n ≥ 0 and every set E ∈ Fn , there exists a real-valued Fk -measurable
                                                              R
(k > n) random variable h, such that: (i) |h1 | = 1 and (ii) E |hn |dµ ≤ ε.
                                                     E        E


   Theorem 1.12 (Martingale Carleson Embedding Theorem). If f ∈ Lp (X , F, µ)
and {αn }n≥0 is a Carleson sequence, then
                           "              #
                             X
                         µ
(1.17)                  E       αn |fn | ≤ Cp · C · Eµ [|f |p ] .
                                        p

                               n≥0

Moreover, if {Fn }n≥0 is an infinitely refining filtration, then the constant Cp = (p0 )p
is sharp.

   Here, again without loss of generality, we can assume f ≥ 0. We define the
Bellman function BµF (F, f, M ; C) in the martingale setting by

(1.18)
                        (     "              #                                              )
                                  X
 BµF (F, f, M ; C) = sup Eµ             αn fnp : f, {αn }n≥0 satisfy (i), (ii), (iii) and (iv) ,
                                  n≥0

                                                 7
                                                  "           #
                                                      X
(i) Eµ [f p ] = F ; (ii) Eµ [f ] = f; (iii) Eµ              αn = M ; (iv) {αn }n≥0 satisfies (1.16).
                                                      n≥0
   Now, we are ready to state the first main theorem.

   Theorem 1.13 (Coincidence of the Bellman functions [24]).

(1.19)                          BµF (F, f, M ; C) ≤ B(F, f, M ; C).

   Moreover, if {Fn }n≥0 is an infinitely refining filtration, then

(1.20)                          BµF (F, f, M ; C) = B(F, f, M ; C).

   For the Doob’s martingale inequality, recall the definition of the maximal function
associated to a discrete-time filtration {Fn }∞
                                              n=0


(1.21)                                     f ∗ (x) = sup |fn (x)|.
                                                      n≥0

   Theorem 1.14 (Doob’s Martingale Inequality). For every p > 1 and every f ∈
Lp (X , F, µ), we have

(1.22)                       ||f ∗ ||p p          ≤ (p0 )p · ||f ||p p           .
                                    L (X ,F ,µ)                    L (X ,F ,µ)

Moreover, if {Fn }n≥0 is an infinitely refining filtration, then the constant (p0 )p is
sharp.

   The study of the Lp -norm of the maximal function was initiated from the cele-
brated Doob’s martingale inequality, e.g. in [30]. The sharpness of this inequality
was shown in [26] and [27] if one looks at all martingales. For particular martingales
including the dyadic case, see [22] and [28]. Theorem 1.14 covers all these results.
   Assuming f ≥ 0, we define the Bellman function BeµF (F, f) associated to the Doob’s
martingale inequality by

(1.23)             BeµF (F, f) = sup {Eµ [|f ∗ |p ] : Eµ [f p ] = F, Eµ [f ] = f} .

   The connection between the Carleson Embedding Theorem and the maximal the-
ory has been known and exploited a lot, e.g. in [20] and [22]. Using this connection,
we give a proof of the second main theorem.
                                                       8
   Theorem 1.15 (The Bellman function of the maximal operators [24]).

(1.24)                      BeµF (F, f) ≤ BµF (F, f, 1; C = 1).

   Moreover, if {Fn }n≥0 is an infinitely refining filtration, then

(1.25)                      BeµF (F, f) = BµF (F, f, 1; C = 1).

                             3. Outline of the thesis

   In chapter 2, we prove Theorem 1.5. We discuss two cases 1 < p ≤ q and
q < p < ∞ seperately. The theorem is actually equivalent to two weight estimates
for positive dyadic operators.
   In chapter 3, we prove Theorem 1.8. Again, we discuss two cases 1 < p ≤ q and
q < p < ∞ seperately. The sufficiency part of the case q < p < ∞ needs to be done
in greater detail.
   In chapter 4, we find a super-solution of Theorem 1.9 via the Burkh¨older’s hull,
which proves the existence of the Bellman function B(F, f, M ; C).
   In chapter 5, we present a remodeling of the Bellman function B(F, f, M ; C) and
use this to prove the two main results Theorem 1.13 and Theorem 1.15.


                                            9
                                         CHAPTER 2

Two weight estimates for a vector-valued positive operators

   In this chapter, we prove Theorem 1.5. We first discuss the easier case 1 < p ≤ q.
Then we present a counterexample that (1.8) itself dose not imply (1.7). Enventually,
we reduce Theorem 1.5 to the well-known two weight estimates for positive dyadic
operators and complete our proof. A comparison of our theorem and the main results
in [9] and [10] is also given.


                                   1. The case 1 < p ≤ q

   We will see in this section that when 1 < p ≤ q, (1.8) is equivalent to (1.7). On
one hand, (1.8) can be deduced from (1.7) by setting f = 1 . On the other hand,
                                                                                J

consider the maximal function


(2.1)                            Mµ f (x) := sup |EµI f (x)| .
                                              x∈I,I∈L


The celebrated Doob’s martingale inequality asserts


(2.2)                            ||Mµ f ||        ≤ p0 · ||f ||        .
                                         Lp (µ)               Lp (µ)


Let Ek := {x ∈ X : Mµ f (x) > 2k } and let Ek := {I ∈ L : I ∈ Ek }. Note that Ek is a
disjoint union of maximal intervals in Ek , maximal in the sense of inclusion. Denote
these disjoint maximal intervals by Ek∗ . Hence, Ek = ∪                    J.
                                                                   J∈Ek∗

                                                  10
                      #p                                                          p
     Z "X          q q      XZ                         X                      q q
                 µ
          αI · E f     dν ≤                                         αI · Eµ f   dν, 1 < p ≤ q
                                                                             
      X                    I                  Ek                              I
             I∈L                         k          I∈Ek \Ek+1
                                                                                              pq
                                        X               Z             X
                                    ≤        2(k+1)p                         αq · 1  dν
                                                                                  I       I
                                         k               Ek      I∈Ek \Ek+1
                                                           "                                      # pq
                                        X               XZ   X
                                    ≤        2(k+1)p                               αq · 1                dν
                                                                                      I       I
                                         k              J∈Ek∗     J     I∈L:I⊆J
                                              X
                                    ≤ C1p ·        2(k+1)p · µ(Ek ), (1.8)
                                               k

                                    . C1p · ||Mµ f ||p p
                                                         L (µ)

                                    ≤ C1p · (p0 )p · ||f ||p p          , (2.2).
                                                                L (µ)


                          2. The case q < p < ∞: a counterexample

   In this section, we see that (1.8) itself is not sufficient for (1.7) for the case q <
p < ∞.
   Consider the real line R with the Borel σ-algebra B(R). Let the lattice be all the
tri-adic intervals. We specify the positive measures µ, ν, the non-negative constants
α = {α : I ∈ L}, and the functions f in the following way.
         I

   Let C = ∩n≥0 Cn be the 1/3-Cantor set, where C0 = [0, 1), C1 = [0, 1/3) ∪ [2/3, 1)
                       n                                             o
and, in general, Cn = ∪ [x, x + 3−n ) : x = nj=1 εj 3−j , εj ∈ {0, 2} .
                                           P

      (i) The measure µ is the Lebsgue measure restricted on [0, 1) and the measure
             ν is the Cantor measure, i.e. ν(I) = 2−n for each I belongs to a connect
             component of Cn .
     (ii) Define α = (2/3)n/p for each I belongs to a connect component of Cn .
                      I

    (iii) For the function f , consider the gap of C, i.e. [0, 1) \ C. This is a disjoint
             union of tri-adic intervals. Let f = (3/2)n/p · n−r for each I ∈ [0, 1) \ C with
             length of I equals 3−n , where r is to be chosen later.

   Claim 2.1. The construction gives a counterexample with properly chosen r.
                                                   11
   Proof. We begin with checking (1.8). It suffices to check for every J belongs to
a connected component of Cn , and thus µ(J) = 3−n . Note that
                                           p               p  
                                                X  2  qkp  q
                              
                                           q                        n
                               X
                                    q                              2
                                  α   · 1     ≤                       .
                                                          
                                                     3             3
                                          
                                          I
                                                            
                                   I                       
                                  I∈L:I⊆J                     k≥n

   Hence,
                              Z  X        p     n
                                            q     2
                                      q
                                     α · 1  dν .      · ν(J) = µ(J).
                                           
                                                   3
                                         I
                               J     I     
                                   I∈L:I⊆J


   Next, we show that (1.7) fails. This requires a careful choice of r in the definition
of f . Picking r > p1 , we have
                                  Z    1               X  3 n                 1         X
             ||f ||  p
                              =              p
                                           |f | dx =                   n−pr ·      · 2n
                                                                                        =     n−pr < ∞.
                     Lp (µ)        0                   n≥1
                                                              2                 3n        n≥1

                                                                           1
   Since q < p < ∞, we can pick r such that                                p
                                                                                < r < 1q . Note that for every I
belongs to a connected component of Cn , we have
                                                           n+1
                                                     1    3 p
                                              Eµ f ≥             (n + 1)−r .
                                               I     3    2

   Hence, consider In = {I : I is tri-adic with length less than or equal to 3−n },
                                              1
                                                           q
         X             q       X  1  3  p             X
                                                        −r 
                      µ
              α I · E f  · 1C ≥                (k + 1)  &      (k + 1)−qr .
                         
                                      3 2
                                      
                      I        n                           
        I∈I      n                k≤n                         k≤n

   And so,
       Z "X             #p      "            # pq
                       q q        X
            αI · Eµ f     dν &    (k + 1)−qr · ν(Cn ) → ∞ as n → ∞.
                      
                                   I
               I∈In                                     k≤n

We can see that the condition q < p < ∞ is crutial in our construction.                                       


                         3. The case q < p < ∞: necessity and sufficiency

   We discuss the case q < p < ∞ of Theorem 1.5 in this section. In particular,
we see that both (1.8) and (1.9) are testing conditions on some families of special
functions.
                                                                  12
     To start, since
                                                                             Z "X           q
                                                                                                #
(2.3)             ||Tαµ f ||q p          =                sup                     αI · Eµ f  gdν,
                                                                                            
                            L (lq ,ν)          ||g||                  =1         X                          I
                                                                0                        I∈L
                                                       L(p/q) (ν)

     we can write
                                                                                                    Z "X           q
                                                                                                                       #
(2.4)      ||Tαµ ||q p                   =             sup                       sup                     αI · Eµ f  gdν.
                                                                                                                   
                   L (µ)→Lp (lq ,ν)           ||f ||            =1   ||g||                     =1    X                            I
                                                       Lp (µ)                        0                     I∈L
                                                                             L(p/q) (ν)

     Without loss of generality, we assume that both f and g are non-negative. The
following lemma reduces us to the scalar-valued case.

     Lemma 2.2.
                                                                                                    Z "X                                  #
(2.5)      ||Tαµ ||q p                   ≈             sup                       sup                                q         µ
                                                                                                                  α · E (f ) gdν.     q
                   L (µ)→Lp (lq ,ν)           ||f ||          =1     ||g||                     =1    X              I         I
                                                       Lp (µ)                 (p/q)0                       I∈L
                                                                             L           (ν)


     An easy application of H¨older’s inequality shows that the LHS of (2.5) is no more
than its RHS. The other half of this lemma depends on the following famous Rubio
de Francia Algorithm.

     Lemma 2.3 (Rubio de Francia Algorithm). For every q < p < ∞ and f ∈ Lp (µ),
there exists a function F ∈ Lp (µ), such that f ≤ F , ||F ||                                                      ≈ |f ||                 and
                                                                                                         Lp (µ)               Lp (µ)
                                              Z
                                         −1
                                  µ(I)                 F q dµ . inf F q (x), I ∈ L.
                                                 I                    x∈I

     Proof. Consider the maximal operator Mµ defined in (2.1). Doob’s martingale
inequality (2.2) implies
                                                                                        0
                                                                                        p
(2.6)                                 ||Mµ ||                                        ≤      .
                                                       Lp/q (µ)→Lp/q (µ)                q
                (0)                (1)                                 (k)                            (k−1)
     Denote Mµ = Id, Mµ = Mµ and Mµ = Mµ ◦ Mµ                                                                   . Define the function F
by
                            "                                                             −k                      # 1q
                          X
(2.7)                 F =   2||Mµ ||                                                                Mµ(k) (f q )          .
                                                          Lp/q (µ)→Lp/q (µ)
                                k≥0

     First we check the validity of the definition for F . Note that
                                                                     13
                    "
                                                −k        # pq  pq
                   Z X                                        
||F ||q p        =       2||Mµ || p/q                (k) q
                                                    Mµ (f ) dµ
         L (µ)      X           L (µ)→Lp/q (µ)                 
                               k≥0

                     X                                           −k Z                                          pq
                                                                                            (k) q  pq
                 ≤          2||Mµ ||                                                       Mµ (f ) dµ                 , Minkowski inequality
                                         Lp/q (µ)→Lp/q (µ)                         X
                     k≥0

                     X                                           −k                                            k
                 ≤          2||Mµ ||                                           ||Mµ ||                                  ||f ||q p      = 2||f ||q p       .
                                         Lp/q (µ)→Lp/q (µ)                                   Lp/q (µ)→Lp/q (µ)                 L (µ)              L (µ)
                     k≥0

     Hence, F is the Lp/q (µ)-limit of the partial sums and thus well-defined. Moreover,
we have also proved that ||F ||                                . ||f ||                .
                                                 Lp (µ)                   Lp (µ)
     Considering only k = 0 in the definition for F , we have F ≥ f . And so ||F ||                                                               ≈
                                                                                                                                         Lp (µ)
||f ||         . Finally, note that
      Lp (µ)
                                                       Z
                                                  −1
(2.8)                                     µ(I)                 F q dµ ≤ inf Mµ (F q )(x)
                                                           I                   x∈I

and
                                     X                                                          −k
                                     q
                           Mµ (F ) =   2||Mµ ||                                                         Mµ(k+1) (f q )
                                                                     Lp/q (µ)→Lp/q (µ)
                                            k≥0

                                         = 2||Mµ ||                                        (F q − f q ) . F q .
                                                           Lp/q (µ)→Lp/q (µ)

Therefore, we deduce
                                                   Z
                                              −1
                                         µ(I)              F q dµ . inf F q (x), I ∈ L.
                                                       I                  x∈I

                                                                                                                                                  
     Applying Rubio de Francia Algorithm, we obtain
    Z "X                  #      Z "X               #
            αq · Eµ (f q ) gdν ≤      αq · Eµ (F q ) gdν
         X           I     I                       X                  I            I
             I∈L                                               I∈L
                                                  Z "X                          q
                                                                                    #
                                             .                       αq · Eµ (F )     gdν
                                                   X                  I                I
                                                               I∈L

                                             ≤ ||Tαµ ||q p                                  · ||F ||        · ||g||        0
                                                                                                                                 , (2.4)
                                                                L (µ)→Lp (lq ,ν)                   Lp (µ)           L(p/q) (ν)
                                                                                                                                        
                                             .    ||Tαµ ||q p              ,                   ||f ||           = ||g||         0
                                                                                                                                       =1 .
                                                          L (µ)→Lp (lq ,ν)                             Lp (µ)            L(p/q) (ν)

                                                                          14
     Now that our problem is reduced to determine a necessary and sufficient condition
of
                             Z X           p
                                             q         Z
                                                                p
                                    q   µ             p
(2.9)                              α · E (f ) dν . C      |f | q dµ,
                                            
                                
                              X    I   I               X
                                   I∈L

we may consult to the scalar-valued Theorem 2.4 below. Therefore, Theorem 1.8
follows from Theorem 2.4 for free, and both (1.8) and (1.9) are testing conditions
with respect to this derived scalar-valued problem.
     Consider the linear operator defined by
                                                      X
(2.10)                                     Tαµ f :=            α · Eµ f.
                                                                I      I
                                                      I∈L

     Theorem 2.4. Let 1 < p < ∞ and let 1/p + 1/p0 = 1. Tαµ : Lp (µ) → Lp (ν) if and
only if
                                  Z  X        p
                                                
(2.11)                                   α · 1  dν ≤ C1p · µ(J), J ∈ L
                                               
                                         I   I
                                   J I∈L:I⊆J
                                                
                     Z  X                    p 0
                                     ν(I)                  0
(2.12)                           α ·      · 1       dµ ≤ C2p · ν(J), J ∈ L.
                                              
                                  I µ(I)
                                              
                                              I
                      J I∈L:I⊆J


In particular, ||Tαµ ||                   ≈ C1 + C2 .
                          Lp (µ)→Lp (ν)


     Remark 2.5. Theorem 2.4 is originally proved for the dyadic case. This general
version is explained in [7].

     Remark 2.6. In [9] and [10], to obtain the two testing conditions, they first
rewrite (1.7) into
               X
(2.13)               α · Eµ f · Eν g · ν(I) ≤ C||f ||                        · ||{g }         ||   0
                                                                                                            .
                      I       I      I I                            Lp (µ)        I     I∈L    Lp (lq ,ν)
               I∈L

Setting f = 1 , one deduces (1.8). For the second testing condition, one turns to
                 J

consider the family of functions {g }                     supported on J ∈ L with L∞ (lq , ν)-norm
                                                I   I∈L

equal to 1. This gives
                Z  X                      p 0
                                ν(I)        
                                                     0
(2.14)                      α ·      · E g  dµ ≤ C p · ν(J), J ∈ L.
                                        ν
                                           
                                µ(I) I 
                            I            I
                 J I∈L:I⊆J

                                                          15
Compare Theorem 1.5 with the main results in [9] and [10]. We have a very different
condition (1.9) than (2.14) with seemingly ’wrong’ exponents. However, both (1.8)
and (1.9) are testing conditions on some families of special functions as we have shown
in this chapter.


                                          16
                                                   CHAPTER 3

                    Two weight estimates for paraproducts

    In this chapter, we prove Theorem 1.8. We first follow the line of chapter 2. But
for the sufficient part of the case 2 < p < ∞, we need to be more careful.
    We start with some useful reductions. Obviously, it sufficies to constider only for
non-negative functions f ≥ 0 in Theorem 1.8. Moreover, recall the following version
of Littlewood-Paley theorem


    Theorem 3.1 (Littlewood-Paley). If a function f has a Littlewood-Paley decom-
                                                                              1
                                                                          ν 2 2
                                                                         
                  ν                                          ν
            P                                                       P
position f = I∈L ∆ f , and define its square function to be S f =    I∈L ∆ f     ,
                              I                                                           I
                          ν
then ||f ||         ||S f ||               for all 1 < p < ∞.
          Lp (ν)                   Lp (ν)


    Proof. See [17].                                                                          

    Applying this theorem, (1.11) is equivalent to

                                            Z "X               #p
                                                               2           Z
                                                  µ 2  ν 2
                                                        
(3.1)         ||S ν (πbµ f )||p p         =       E f   ∆ b    dν . C p
                                                                              |f |p dµ.
                                  L (ν)       X            I         I        X
                                                  I∈L


We will consider (3.1) instead of (1.11) in the following.


                         1. Construction of the stopping intervals

    Let us construct a collection G ⊆ F ⊆ L of stopping intervals as follows. Given
a non-negative function f ≥ 0. For J ∈ F, let G ∗ (J) be the collection of maximal
intervals I ⊆ F, I ∈ J such that

                                                  hf i     > 2hf i        .
                                                     I,µ            J,µ


In case there are more than one such intervals I, we choose the one from the smallest
generation. Note that intervals from G ∗ (J) are pairwisely disjoint. Let F(J) = {I ∈
                                                               17
F : I ⊆ J} and let G(J) = ∪                      I. Define also E(J) = F(J) \ ∪                   F(I). Then
                                     I∈G ∗ (J)                                        I∈G ∗ (J)
we have the following properties

        (i) For any I ∈ E(J), hf i            ≤ 2hf i          ,
                                        I,µ              J,µ

        (ii) µ(G(J)) < 21 µ(J).

   To construct a collection G, fix some large integer N ∈ Z and consider all maximal
intervals J from {Lk }k≥−N and J ∈ F. These intervals form the first generation G1∗ of
stopping intervals. Inductively define the (n+1)-th generation of stopping intervals by
 ∗
Gn+1 =∪        ∗
                   G ∗ (I) and we define the collection of stopping intervals by G = ∪n≥1 Gn∗ .
            I∈Gn

   Property (ii) implies that the collection G of stopping intervals satisfies the famous
Carleson measure condition

                                      X
(3.2)                                         µ(I) < 2µ(J), J ∈ L.
                                    I∈G,I⊆J


A special form of the Martingale Carleson Embedding Theorem 1.12 says


   Theorem 3.2. Let µ be a measure on (X , T ) and let α ≥ 0, I ∈ L satisfy the
                                                                            I

Carleson measure condition

                                           X
(3.3)                                               α ≤ C · µ(J).
                                                     I
                                        I∈G,I⊆J


Then for any measurable function f and any 1 < p < ∞

                            X               p
(3.4)                             α hf i  ≤ C · (p0 )p · ||f ||p p              .
                                            
                                   I     I,µ                        L (X ,T ,µ)
                            I∈L


                                       2. The case 1 < p ≤ 2

   We will see in this section that when 1 < p ≤ 2, (1.12) is equivalent to (3.1). On
one hand, (1.12) can be deduced from (3.1) by setting f = 1 . On the other hand,
                                                                                  J

we can apply the stopping intervals with F = L constructed in the previous section
to obtain
                                                         18
                                  p                          p
 Z        X              2    2 2      Z X X      2    2 2
                     µ   ν                   µ   ν 
                    E f   ∆ b    dν =       E f   ∆ b    dν,
  X                   I     I            X                           I        I
       I∈L,I⊆L−N                                J∈G I⊆E(J)
                                                                               p2
                                         Z
                                                                            2
                                               X                       
                                                                     X     
                                     ≤          4hf i2                ∆ν b  dν, 1 < p ≤ 2
                                          X                  J,µ              I
                                                J∈G                I⊆E(J)
                                                                            p
                                             X               Z     X     2 2
                                     ≤ 2p           hf ip             ∆ν b  dν, (1.12)
                                                                          
                                                       J,µ   J                    I
                                              J∈G                  I⊆E(J)
                                             X
                                     ≤ 2p           hf ip · C1p · µ(J), (3.4)
                                                       J,µ
                                              J∈G

                                     ≤ C1p · 2p+1 · (p0 )p · ||f ||p p                 .
                                                                         L (X ,T ,µ)


     Letting N → ∞, we prove exactly (3.1). We can see 1 < p ≤ 2 plays an important
role in this argument. There is no analogue for the case 2 < p < ∞.


                      3. The case 2 < p < ∞: a counterexample

     In this section, we see that (1.12) itself is not sufficient for (3.1) for the case
2 < p < ∞.
     Consider the real line R with the Borel σ-algebra B(R). Let the lattice be all the
tri-adic intervals. We specify the admissible measures µ, ν and the functions b, f in
the following way.
   Let C = ∩n≥0 Cn be the 1/3-Cantor set, where C0 = [0, 1), C1 = [0, 1/3) ∪ [2/3, 1)
                       n                                             o
and, in general, Cn = ∪ [x, x + 3−n ) : x = nj=1 εj 3−j , εj ∈ {0, 2} .
                                           P


       (i) The measure µ is the Lebsgue measure restricted on [0, 1) and the measure
           ν is the Cantor measure, i.e. ν(I) = 2−n for each I belongs to a connect
           component of Cn .
      (ii) For the function b, we specify its martingale differences ∆ν b. Let |∆ν b| =
                                                                                           I   I
           (2/3)n/p for each I belongs to a connect component of Cn such that
           R
             (∆ν b)dν = 0.
            I   I
                                              19
     (iii) For the function f , consider the gap of C, i.e. [0, 1) \ C. This is a disjoint
          union of tri-adic intervals. Let f = (3/2)n/p · n−r for each I ∈ [0, 1) \ C with
          length of I equals 3−n , where r is to be chosen later in the proof.

   Claim 3.3. The above construction gives a counterexample.

   Proof. We begin with checking (1.12). It suffices to check for every J belongs
to a connected component of Cn , and thus µ(J) = 3−n . Note that
                              "            #p "
                                         2 2         2kp # p2  n
                               X              X    2            2
                                    ∆ν b    ≤                 .     .
                                         
                                I⊆J
                                      I
                                                k≥n
                                                     3            3

   Hence,
                Z "X         #p       n         n  n
                            2 2       2           2    1
                        ν 
                        ∆ b    dν .      ν(J) =     ·     = µ(J).
                 J I⊆J    I            3           3    2

   Next, we show that (3.1) fails. This requires a careful choice of r in the definition
of f . Picking r > p1 , we have
                              Z    1               X  3 n                  1         X
          ||f || p
                          =              p
                                       |f | dx =                    n−pr ·    n
                                                                                · 2n
                                                                                     =     n−pr < ∞.
                 Lp (µ)        0                   n≥1
                                                           2                 3         n≥1

                                                                         1
   Since 2 < p < ∞, we can pick r such that                              p
                                                                             < r < 12 . Note that for every I
belongs to a connected component of Cn , we have
                                                       n+1
                                               1      3 p
                                             µ
                                          E f≥               (n + 1)−r .
                                           I   3      2

   Hence, consider In = {I : I is tri-adic with length less than or equal to 3−n },
                                             1            2
         X        2       2         X 1 3 p                X
                                                          −r 
                µ   ν 
                  f      ∆  b    · 1   ≥          (k + 1)       &  (k + 1)−2r .
                                           
              E
                                           3 2
                                Cn
                                                            
                I         I                                  
         I∈In                                      k≤n                                 k≤n

   And so,
      Z "X               #p      "            # p2
                 2    2 2       X
                                     (k + 1)−2r · ν(Cn ) → ∞ as n → ∞.
            µ   ν 
            E f   ∆ b    dν &
                     I             I
          I∈In                                           k≤n

                                                                                                           
                                                               20
                     4. The case 2 < p < ∞: trilinear forms and necessity

      We discuss the necessity of Theorem 1.8 in this section. In particular, we see that
both (1.12) and (1.13) are testing conditions on some families of special functions. To
make our explanations more clear and also for later purpose, we generalize Theorem
1.8 to a trilinear form.
      (1.12) is a simple testing condtion on functions of the form f = 1 , but (1.13) is
                                                                                                                                       J

not that clear. To deduce (1.13) from (3.1), we first note that    is constant on each                                      ∆ν b
                                          2                                                                              I

I 0 ∈ child(I). Let β 0 = µ(I)−1 ∆ν b · 1 0 for each I 0 ∈ child(I). (3.1) becomes
                                      
                                  II                                    I               I
                                                                                         p2
                         Z     X              X                         Z          2                 Z
                                                                                                     p
(3.5)                                                      β                   f dµ 1 0     dν . C     |f |p dµ.
                         X                                    II 0          I                        I                        X
                                I∈L     I 0 ∈child(I)

      Consider the following generalization of Theorem 1.8 to a trilinear form.

   Theorem 3.4 (Two weight estimates for a trilinear form). For every sequence of
                      n    o
non-negative constants β 0            , define the trilinear operator
                                                   II
                                                           I∈L,I 0 ∈child(I)

                                              X            X                        Z               Z     Z    
(3.6)               Π(f, g, h) =                                         β                      f dµ     gdµ     hdν .
                                                                             II 0           I                    I                I0
                                              I∈L       I 0 ∈child(I)


(3.7)                                   Π(f, g, h) ≤ C||f ||                                ||g||              ||h||    p 0
                                                                                Lp (µ)                Lp (µ)
                                                                                                                     L( 2 ) (ν)

      holds if and only if

        (i)
                                                                                                    p2
                   Z           X               X                                                                       p
(3.8)                                                       β          · µ(I)2 · 1 0  dν ≤ C12 · µ(J), J ∈ L,
                     J                                           II 0                           I
                             I∈L,I⊆J I 0 ∈child(I)

        (ii)

(3.9)
                                                                       ( p2 )0
Z                                                                                          0
      X                X                                                               (p)
                                       β          · µ(I) · ν(I 0 ) · 1         dµ ≤ C2 2 · ν(J 0 ), J ∈ L, J 0 ∈ child(J).
 J0                                         II 0                                    I
        I∈L    I 0 ∈child(I),I 0 $J 0

In particular, C ≈ C1 + C2 .
                                                                                    21
    Remark 3.5. Note that Theorem 3.4 is written in duality form. In Theorem 3.4,
                                             2
if we choose g = f and β 0 = µ(I)−1 ∆ν b · 1 0 , and take care of the powers of
                                        
                                          II                                 I             I
the constants, then we recover Theorem 1.8.

   All amounts to deduce (3.9) from (3.7). The argument depends on again the
Rubio de Francia Algorithm Lemma 2.3. Let f = g = F in (3.7), we obtain
              X X            Z      2 Z      
 Π(F, F, h) =            β 0    F dµ        hdν
                                               II           I                        I0
                      I∈L I 0 ∈child(I)
                                                                                                      
                      Z     X             X                     Z                2
                 =                                    β                 F dµ             · 1 0  hdν ≤ C||F ||2 p                      ||h||                  .
                                                        II 0                                    I                                                 p 0
                       X      I∈L   I 0 ∈child(I)                    I                                                         L (µ)           L( 2 ) (ν)


   By Lemma 2.3 with q = 2, we have ||F ||                                                     ≈ ||f ||                 and
                                                                                  Lp (µ)                       Lp (µ)
             Z              2                                                                Z                               Z
                                                 2              2                                          2
                      F dµ         ≥ µ(I) · inf F (x) & µ(I) ·                                         F dµ ≥ µ(I) ·                 f 2 dµ,
                  I                                   x∈I                                          I                             I

thus we deduce
                                                                                     
        Z X                  X                                  Z             
(3.10)                                   β          · µ(I)              f 2 dµ · 1 0  hdν . C||f ||2 p                              ||h||                ,
                                            II 0                                               I                                                 p 0
             X     I∈L    I 0 ∈child(I)                              I                                                        L (µ)           L( 2 ) (ν)


   which implies
                                                                                              
   Z X         X                                                Z               
                                       β          · µ(I)                hdν         · 1  f 2 dν . C||f ||2 p                        ||h||            .
                                          II 0                                             I                                                 p 0
    X      I∈L I 0 ∈child(I),I 0 $J 0                               I0                                                        L (µ)       L( 2 ) (ν)


   Testing on h = 1 0 , we get exactly (3.9).
                               J


   5. The case 2 < p < ∞: from trilinear forms to shifted bilinear forms

   In this section, we give an equivalent statement of Theorem 3.4 in terms of a
shifted positive operator. Based on this, we will prove the sufficiency in the next
section.
   We start to understand Theorem 3.4 by two claims.

   Claim 3.6.

                                  Π(f, g, h) ≤ C||f ||                            ||g||                ||h||       p 0
                                                                         Lp (µ)           Lp (µ)
                                                                                                                L( 2 ) (ν)
                                                                           22
is equivalent to

                                            Π(f, f, h) ≤ C||f ||2 p                 ||h||   p 0
                                                                                                         .
                                                                           L (µ)         L( 2 ) (ν)


   Proof. Only need to see the later implies the former. Since by definition (3.6),
we have
                                                                                                                      
     2Π(f, g, h) ≤ Π(f, f, h) + Π(g, g, h) ≤ C ||f ||2 p                                            + ||g||   2
                                                                                                                            ||h||  p 0
                                                                                       L (µ)                  Lp (µ)            L( 2 ) (ν)

   Hence, by homogeniety, for every t > 0,
                                                             
                         1             2    2     1     2
       Π(f, g, h) = Π(tf, g, h) ≤ C t ||f || p + 2 ||g|| p      ||h|| p 0 .
                         t                  L (µ) t     L (µ)        L( 2 ) (ν)

   Taking t2 = ||g||                       /||f ||         , we conclude that
                                  Lp (µ)          Lp (µ)


                                   Π(f, g, h) ≤ C||f ||                     ||g||           ||h||   p 0
                                                                                                               .
                                                                   Lp (µ)       Lp (µ)
                                                                                                 L( 2 ) (ν)

                                                                                                                                                   
   Claim 3.7. Π(f, f, h) ≤ C||f ||2 p                              ||h||    p 0
                                                                                      holds, if and only if
                                                           L (µ)         L( 2 ) (ν)


(3.11)
    X      X                                 Z              Z               
                                                      2
                       β          · µ(I)             f dµ                hdν        ≤ C||f 2 ||           p
                                                                                                                    ||h||            holds.
                           II 0                                                                        L( 2 ) (µ)
                                                                                                                           p 0
   I∈L I 0 ∈child(I)                             I                  I0                                                  L( 2 ) (ν)


   Proof. In the last section, we have deduced that

                                            Π(f, f, h) ≤ C||f ||2 p                 ||h||      p 0
                                                                            L (µ)           L( 2 ) (ν)
                                                                                                                                     R
implies (3.10), which is equivalent to (3.11). On the other hand, since µ(I) ·                                                           I
                                                                                                                                             f 2 dµ ≥
 R       2
   I
     f dµ , we know (3.11) implies

                                            Π(f, f, h) ≤ C||f ||2 p                 ||h||   p 0
                                                                                                         .
                                                                           L (µ)         L( 2 ) (ν)

                                                                                                                                                   
   Because of the two claims, if we supress notation α                                                  =β           · µ(I) in (3.11) and
                                                                                                II 0          II 0
instead of assuming 2 < p < ∞ and considering p/2, we still let 1 < p < ∞ and
consider p. Theorem 3.4 can be restated into the following.
                                                                      23
   Theorem 3.8 (Two weight estimates for shifted positive operator). For every
                                  n    o
sequence of non-negative constants α 0          , define the shifted positive op-
                                                                    II
                                                                              I∈L,I 0 ∈child(I)
erator
                                                X             X                   Z             Z    
(3.12)                      Tα (f, g) =                                   α                 f dµ     gdν .
                                                                           II 0         I              I0
                                                    I∈L I 0 ∈child(I)


(3.13)                                        Tα (f, g) ≤ C||f ||                        ||g||    0
                                                                                Lp (µ)        Lp (ν)


   holds if and only if

         (i)
                                                                                     p
                  Z         X            X
(3.14)                                                 α          · µ(I) · 1 0  dν ≤ C1p · µ(J), J ∈ L,
                    J                                       II 0                  I
                        I∈L,I⊆J I 0 ∈child(I)


       (ii)

(3.15)
                                                                        p0
   Z      X                 X                                                0
                                        α           · ν(I 0 ) · 1  dµ ≤ C2p · ν(J 0 ), J ∈ L, J 0 ∈ child(J).
    J0                                       II 0                    I
            I∈L I 0 ∈child(I),I 0 $J 0


In particular, C ≈ C1 + C2 .


                                   6. The case 2 < p < ∞: sufficiency

   This section is dedicated to prove Theorem 3.8 and hence the sufficiency of The-
orem 1.8 for the case 2 < p < ∞. The idea of the proof is from [7] with some new
twists. It suffices to consider only for f ≥ 0 and g ≥ 0.
   We split the estimate into two parts according to the following splitting condition:
L = A ∪ B, where
                      n                           0
                                                         o
(3.16)             A = I ∈ L : hf ip · µ(I) ≥ hgip · ν(I) and B = L \ A.
                                                      I,µ                         I,ν


   Standard approximation reasoning allows us to assume that only finitely many
terms α           are non-zero, so all the sums are finite.
           II 0
                                                                          24
   For an interval I ∈ L, let Ib denote its parents. Using the splitting condition
(3.16), we can write Tα (f, g) = T1 + T2 , where
                                X      Z         Z    
(3.17)                    T1 =      α       f dµ      gdν ,
                                         II
                                         b
                                                  Ib          I
                                  I∈A


                                  X           Z            Z    
(3.18)                     T2 =         α              f dµ     gdν .
                                         II
                                         b
                                                  Ib          I
                                  I∈B

   6.1. A modified stopping interval construction. To estimate T1 we need to
modify a bit the construction of stopping intervals from Section 1. The main feature
of the construction is that the stopping intervals well be the intervals I ∈ A, but the
stopping criterion will be checked on their parents I.
                                                    b

   We start with some interval J (not necessarily in A). For the interval J we define
the primary preliminary stopping intervals to be the maximal by inclusion intervals
Ib ⊆ J, I ∈ A, such that

(3.19)                                  hf i b > 2hf iJ,µ .
                                            I,µ


   Note that different I ∈ A can give the same I,
                                               b but this Ib is counted only once.

   It is obvious that these primary preliminary stopping intervals are disjoint and
their total µ-measure is at most µ(J)/2.
   For each such preliminary stopping interval pick all its children L that belong to A
(there is at least one such L), and declare these children to be the stopping intervals.
   For the children K ∈
                      / A we continue the process: we will find the maximal by
inclusion intervals Ib ⊆ K, I ∈ A satisfying (3.19), and declare these Ib to be the
secondary preliminary stopping intervals (note that in the stopping criterion (3.19)
we still compare with the average over the original interval J).
   For these secondary preliminary stopping intervals we add their children L ∈ A
to the stopping intervals, and for the children K ∈
                                                  / A we continue the precess (again,
still comparing the averages with the average over the original interval J).
   We assumed that the collection A is finite, so at some point the process will stop
(no I ∈ A, Ib ⊆ K). We end up with the disjoint collection G ∗ (J) of stopping intervals.
                                                   25
   Since all the stopping intervals are inside the primary preliminary stopping inter-
vals, we can conclude that
                                           X               1
(3.20)                                               µ(I) < µ(J).
                                                           2
                                         I∈G ∗ (J)
                    S                                                   S
   Let G(J) :=      I∈G ∗ (J)   I. Define E(J) = A(J) \                     I∈G ∗ (J)   A(I), where A(J) = {I ∈
A : I ⊆ J} . It easily follows from the construction that for any I ∈ E(J)

(3.21)                                      hf iI,µ ≤ 2hf iJ,µ .

    To construct a collection G, we start with G0 of disjoint intervals covering the
    S     b For each J ∈ G0 we run the stopping intervals construction to get the
set I∈A I.
collection G ∗ (J). The union J∈G0 G ∗ (J) give us the first generation of stopping
                                 S

                                    ∗
intervals G1∗ . Define inductively Gk+1 = J∈G ∗ G ∗ (J) and put G = k≥1 Gk∗ .
                                         S                         S
                                                               k

   Note that the condition (3.20) implies that the collection G satisfies the following
Carleson measure condition
                                     X
(3.22)                                         µ(I) < 2µ(J), J ∈ L.
                                   I∈G,I⊆J

   we also can replace G by G ∪ G0 here, and still have the same estimate.

   6.2. Estimation of T1 . We start with the estimation of T1 . Using the modified
stopping intervals constructed in the previous subsection and remember that J ∈ G0
is chosen such that J ∈/ A, we obtain
         X       Z       Z        X                            X               Z             Z     
             α       f dµ      gdν =                                      α                  f dµ     gdν
               II
               b
                        Ib           I                                        II
                                                                              b
                                                                                        Ib           I
         I∈A                                         J∈G∪G0 I∈E(J)


                                                   =
                                                    A +
                                                       B


                                 X             X               Z            Z     
                        
                        A =                            α                f dµ     gdν
                                                          II
                                                          b
                                                                   Ib               I
                                J∈G∪G0 I∈E(J),I6=J


                                         X           Z          Z     
                                
                                B =            α            f dµ     gdν
                                                II
                                                b
                                                       Ib                 I
                                         I∈G

                                                       26
    For piece ,
              A by (3.21), we have

                                   X                                  X                                Z            
                         
                         A ≤                2hf i                                α      · µ(I)
                                                                                            b ·                gdν
                                                       J,µ                         II                     I
                                                                                   b
                                  J∈G∪G0                         I∈E(J),I6=J
                                                                                                                   
                                   X                         Z                X
                             =              2hf i                                      α         · µ(I)
                                                                                                      b · 1  gdν
                                                       J,µ                                II                    I
                                                                  J
                                                                                          b
                                  J∈G∪G0                                  I∈E(J),I6=J


                             =
                              1 +
                                 2


                                                                                                                        
                                 X                     Z                         X
                   
                   1 =                   2hf i                                             α          · µ(I)
                                                                                                           b · 1  gdν,
                                                 J,µ                                            II                  I
                                                           J\G(J)
                                                                                                b
                             J∈G∪G0                                         I∈E(J),I6=J


                                                                                                                        
                                 X                     Z                         X
                  
                  2 =                    2hf i                                               α        · µ(I)
                                                                                                           b · 1  gdν.
                                                 J,µ                                            II                  I
                                                         J∩G(J)
                                                                                                b
                             J∈G∪G0                                           I∈E(J),I6=J


    To estimate ,
                1 since the sets J \ G(J) are pairwisely disjoint,

                                                               p  1
                                  Z                                   p Z                 10
                                                                                             p
                                                                                      p0
         X                             X                        

1 ≤               2hf i                         α   · µ(I)
                                                           b · 1  dν             |g|   dν      , (3.14)
                           J,µ                                  I
                                    J I∈E(J),I6=J II
                                      
                                                                            J\G(J)
                                                    b
        J∈G∪G0                                                   
                                                         Z                              10
                                                   1                                      p
                                                                                 p0
         X
    ≤             2hf i          · C1 · µ(J) ·     p                          |g| dν           , H¨older’s inequality
                           J,µ
        J∈G∪G0                                                    J\G(J)

                 "                                     # p1 "                                           # 10
                                                                      X Z                                 p
                                                                                                  p0
                         X
    ≤ C1 · 2                     hf ip · µ(J)                                             |g| dν               , (3.4) and disjointness
                                    J,µ                                          J\G(J)
                     J∈G∪G0                                       J∈G∪G0

                 1+ p1
    ≤ C1 · 2             · p0 · ||f ||           ||g||       0
                                                                      .
                                      Lp (µ)            Lp (ν)


                2 since the sets from G ∗ (J) are pairwisely disjoint and G(J) =
    To estimate ,
∪           K,
K∈G ∗ (J)

                                                                                                                        
                            X                       X Z                          X
                  
                  2 =              2hf i                                                     α        · µ(I)
                                                                                                           b · 1  gdν.
                                           J,µ                                                    II                 I
                                                                      K
                                                                                                  b
                             J∈G                 K∈G ∗ (J)                    I∈E(J),I6=J

                                                                           27
  Note that the integrand is constant on every K ∈ G ∗ (J). Hence, we obtain


                                                                                                                    
        X                    Z                X                                             X

2 =            2hf i                                  α         · µ(I)
                                                                     b · 1                         hgi        · 1  dν
                      J,µ                                  II                  I                          K,ν     K
                             J
                                                           b
      J∈G∪G0                        I∈E(J),I6=J                                          K∈G ∗ (J)

                                                          p  1                                                          p0  p10
                              Z                                  p                                Z                        
        X                            X                                                                 X                   
 ≤             2hf i                            α · µ(I)
                                                      b · 1  dν                                                 hgi     · 1  dν 
                                                                                                     
                       J,µ                                 I
                                                                                                                     K,ν    K
                                                                                                                                   
                                  J I∈E(J),I6=J II
                                                                                                       
                                                                                                      J K∈G ∗ (J)
                                                 b
      J∈G∪G0                                                                                                                 
                                                           p  1                      10
                              Z                                  p                      p
                                                                                  0
        X                            X                              X
 =             2hf i                           α · µ(I)
                                                       b · 1  dν           hgip · ν(K) .
                      J,µ                        II        I                   K,ν
                                  J I∈E(J),I6=J
                                                  b
      J∈G∪G0                                                        K∈G ∗ (J)


  Using the splitting condition (3.16) for the definition A, we can estimate


                                                                p  1                        10
                                  Z                                   p                         p
           X                                 X                            X
                                                                                      p
 
 2 ≤               2hf i                            α · µ(I)
                                                            b · 1  dν          hf i    · µ(K)
                         J,µ                                     I                   K,µ
                                        J I∈E(J),I6=J II
                                                      b
          J∈G∪G0                                                             ∗
                                                                           K∈G (J)
                                                                                                   10
                                                                                                      p
           X                                      1               X
                                                                                   p
      ≤            2hf i         · C1 · µ(J) ·   p                         hf i         · µ(K)
                         J,µ                                                       K,µ
          J∈G∪G0                                                K∈G ∗ (J)

                                                       # p1                                                     10
                 "                                                                                                p
                      X                                            X           X
                                      p                                                        p
      ≤ C1 · 2                   hf i         · µ(J)                                      hf i      · µ(K) , (3.4)
                                      J,µ                                                      K,µ
                     J∈G∪G0                                      J∈G∪G0 K∈G ∗ (J)


      ≤ C1 · 4(p0 )p · ||f ||p p              .
                                   L (µ)


  Combine the estimation of 
                            1 and .
                                  2 We conclude


                                          1
                         1+
               A ≤ C1 · 2 p · p0 · ||f ||
                                                                    ||g||     0
                                                                                         + C1 · 4(p0 )p · ||f ||p p           .
                                                            Lp (µ)           Lp (ν)                                   L (µ)

                                                                      28
   For piece ,
             B note that
                  X
           
           B =          α      · µ(I)
                                   b · ν(I) · hf i                   · hgi
                          II
                          b                                   I,µ
                                                              b              I,ν
                  I∈G
                  "                                                       # p1 "              # 10
                                                                                               p
                                                                                         0
                   X                                                               X
              ≤           αp · µ(I)
                                 b p · ν(I) · hf ip                                    p
                                                                                    hgi · ν(I)     , (3.16)
                            II
                            b                                       I,µ
                                                                    b                        I,ν
                   I∈G                                                             I∈G
                  "                                                       # p1 "               # 10
                   X                                                               X            p

              ≤           αp · µ(I)
                                 b p · ν(I) · hf ip                                     p
                                                                                    hf i · µ(I)     , (3.4)
                            II
                            b                                       I,µ
                                                                    b                        I,ν
                   I∈G                                                             I∈G
                                                "                                                            # p1
                   1                                X
              ≤2   p0   · p · ||g||  0
                                            ·             αp · µ(I)
                                                                 b p · ν(I) · hf ip                                 .
                                   Lp (ν)                     II
                                                              b                                        I,µ
                                                                                                       b
                                                    I∈G

   To finish, we need the following lemma.

   Lemma 3.9. The sequence {α }                               ,
                                                I       I∈L
                                                    X
                                   α =                            αp 0 · µ(I)p · ν(I 0 )
                                      I                             II
                                            I 0 ∈child(I)

satisfies the Carleson measure condition.

   Proof. For J ∈ L, we have
                
                                                              p1 p
      X          X          X                                      
            α =                          αp 0 · µ(I)p · 1 0                                            , || · || p ≤ || · || 1
                                                                   
             I                             II             I                                                         l         l
    I∈L:I⊆J      I⊆J I 0 ∈G,I 0 ∈child(I)                          
                                                                                                   Lp (ν)
                                                                p
                                                                
                     X           X                              
                  ≤                         α 0 · µ(I) · 1 0 
                      I⊆J I 0 ∈G,I 0 ∈child(I) II           I
                                                                  
                                                                                         Lp (ν)
                                                                 p
                   Z X            X
                                                                  
                                                                 
                  =                           α 0 · µ(I) · 1 0  dν, (3.14)
                    J  I⊆J I 0 ∈G,I 0 ∈child(I) II           I
                                                                  

                  ≤ C1p · µ(J).

                                                                                                                                       
   Hence, we can estimate
                                                    1
                               B ≤ C1 · 2 p0 · p · p0 · ||f ||
                                                                                        ||g||     0
                                                                                                        ,
                                                                              Lp (µ)          Lp (ν)

                                                                  29
which, together with the estimation of ,
                                       A imply that

                                         1
            T1 ≤ C1 · 21+ p · p0 · ||f ||                                   ||g||    0
                                                                                                + C1 · 4(p0 )p · ||f ||p p
                                                                   Lp (µ)          Lp (ν)                                       L (µ)
                                    1
                + C1 · 2            p0   · p · p0 · ||f ||                   ||g||          0
                                                                                                .
                                                                    Lp (µ)            Lp (ν)


   6.3. Estimation of T2 . Now we take care of the estimation of T2 . The estimation
proceeds similar as in subsection 6.2. Using the stopping intervals constructed in
section 1 with F = B, we obtain

         X               Z              Z     X X                                                    Z              Z     
                α                   f dµ     gdν =    α                                                             f dµ     gdν
                    II
                    b
                               Ib                      I                                            II
                                                                                                    b
                                                                                                               Ib               I
          I∈B                                                                J∈G I∈E(J)


                                                                        =
                                                                         A +
                                                                            B


                                          X                X                      Z             Z    
                               
                               A =                                      α                   f dµ     gdν ,
                                                                            II
                                                                            b
                                                                                       Ib                  I
                                          J∈G I∈E(J),I6=J


                                                     X                  Z             Z    
                                         
                                         B =                   α                  f dµ     gdν .
                                                                   JJ
                                                                   b
                                                                             Jb                     J
                                                     J∈G


   Note here I 6= J, we can write
                                                                                                              
                         X                         Z               X
           
           A ≤                 2hgi            ·                                 α        · ν(I) · 1  f dµ = 
                                                                                                               1 +
                                                                                                                  2
                                         J,ν                                          II                  Ib
                                                     J
                                                                                      b
                         J∈G                                   I∈E(J),I6=J


                                                                                                                           
                           X                               Z                      X
                
                1 =                 2hgi               ·                                       α        · ν(I) · 1  f dµ,
                                               J,ν                                                  II                 Ib
                                                               J\G(J)
                                                                                                    b
                           J∈G                                              I∈E(J),I6=J


                                                                                                                           
                           X                               Z                      X
                
                2 =                 2hgi               ·                                       α        · ν(I) · 1  f dµ.
                                               J,ν                                                  II                 Ib
                                                           J∩G(J)
                                                                                                    b
                           J∈G                                              I∈E(J),I6=J

                                                                             30
  To estimate ,
              1 again since the sets J \ G(J) are pairwisely disjoint,

                                                                  p0  p10
        X                      Z                                            Z                p1
                                          X                        
  
  1 ≤         2hgi                                   α · ν(I) · 1  dν               |f |p dµ , (3.15)
                                                                      
                      J,ν
                              
                                       J I∈E(J),I6=J II         Ib 
                                         
                                                                                J\G(J)
                                                      b
        J∈G

                  "     # 10 "                                                                         # p1
                         p     XZ
                   0
             X
                 p
    ≤ C2 · 2  hgi · ν(J)                                                                    |f |p dµ          , (3.4) and disjointness
                                       J,ν                                       J\G(J)
                      J∈G                                           J∈G

                1+ p10
    ≤ C2 · 2               · p · ||f ||                   ||g||      0
                                                                             .
                                                Lp (µ)           Lp (ν)


              2 note again that the sets from G ∗ (J) are pairwisely disjoint and
  To estimate ,
G(J) = ∪                  K. Hence,
         K∈G ∗ (J)

                                                                                                                               
                              X                            X Z                         X
               
               2 =                     2hgi                                                       α        · ν(I) · 1  f dµ.
                                                 J,ν                                                   II                  Ib
                                                                         K
                                                                                                       b
                              J∈G                        K∈G ∗ (J)                I∈E(J),I6=J


  Since the integrand is constant on K, we have
                                                                                                                                  
       X                      Z                 X                                                  X
 
 2 =         2hgi         ·                                 α       · ν(I) · 1                             hf i         · 1  dµ
                  J,ν                                          II                      Ib                         K,µ           K
                               J
                                                               b
       J∈G                               I∈E(J),I6=J                                           K∈G ∗ (J)
                                                                                         p  1
                                                           Z                                  p
       X                                             1              X                     
   ≤         2hgi         · C2 · ν(J) p0                 ·                  hf i    · 1  dµ , disjointness
                    J,ν                                                          K,µ    K
       J∈G                                                       J K∈G ∗ (J)              
                                                                                                       p1
       X                                             1            X
   =         2hgi         · C2 · ν(J) p0 ·                                  hf ip          · ν(K) , splitting condition (3.16)
                  J,ν                                                              K,µ
       J∈G                                                   K∈G ∗ (J)

              "                                   # 10                                                          p1
                                                   p
                                   0                            0
                  X                                     X X
   ≤ C2 · 2               hgip              · ν(J)         hgip                                   · ν(K) , (3.4)
                                  J,ν                                                        K,ν
                  J∈G                                            J∈G     K∈G ∗ (J)

                      0                 0
   ≤ C2 · 4(p)p · ||g||p                    0
                                                 .
                                       Lp (ν)


  Combine the estimation of 
                            1 and ,
                                  2 we have

                                        1+ p10                                                                         0             0
              
              A ≤ C2 · 2                         · p · ||f ||                ||g||      0
                                                                                               + C2 · 4(p)p · ||g||p                         .
                                                                  Lp (µ)              Lp (ν)
                                                                                                                                         0
                                                                                                                                    Lp (ν)

                                                                                 31
   Finally, to estimate ,
                        B note again
              X
       
       B =             α         · µ(J)
                                     b · ν(J) · hf i                        · hgi
                           JJ
                           b                                        J,µ
                                                                    b               J,ν
              J∈G
              "                                                                  # p1 "                                        # 10
                                                                                                                                    p
                                                                                                          p0
                  X                                                                          X
                             p           b p · ν(J) · hf ip
          ≤              α           · µ(J)                                                         hgi         · ν(J)                  , (3.4)
                             JJ
                             b                                            J,µ
                                                                          b                               J,ν
                  J∈G                                                                        J∈G
                                                         "                                                             # p1
                  1                                       X
          ≤2      p0   · p · ||g||          0
                                                    ·            αp · µ(J)
                                                                        b p · ν(J) · hf ip                                          , Lemma 3.9
                                          Lp (ν)                    JJ
                                                                    b                                            J,µ
                                                                                                                 b
                                                           J∈G
                           1
          ≤ C1 · 2 p0 · p · p0 · ||f ||                            ||g||     0
                                                                                    .
                                                          Lp (µ)          Lp (ν)

Hence, we deduce that

                                      1+ p10                                                                           0                 0
              T2 ≤ C2 · 2                      · p · ||f ||               ||g||     0
                                                                                              + C2 · 4(p)p · ||g||p
                                                                 Lp (µ)           Lp (ν)
                                                                                                                                             0
                                                                                                                                        Lp (ν)
                                      1
                   + C1 · 2 p0 · p · p0 · ||f ||                            ||g||        0
                                                                                                .
                                                                   Lp (µ)          Lp (ν)

   Eventually, we conclude that

          Tα (f, g) = T1 + T2
                                                1
                           ≤ C1 · 21+ p · p0 · ||f ||                            ||g||     0
                                                                                                        + C1 · 4(p0 )p · ||f ||p p
                                                                       Lp (µ)            Lp (ν)                                                  L (µ)

                                           1+ p10                                                                               0                 0
                         + C2 · 2                       · p · ||f ||             ||g||    0
                                                                                                     + C2 · 4(p)p · ||g||p
                                                                     Lp (µ)             Lp (ν)
                                                                                                                                                      0
                                                                                                                                                 Lp (ν)

                                           1+ p10
                         + C1 · 2                       · p · p0 · ||f ||               ||g||       0
                                                                                                           .
                                                                            Lp (µ)             Lp (ν)

    By homogeniety, for every t > 0,
                                                                                    
                   1                   p      p                          1    p0
Tα (f, g) = Tα (tf, g) . (C1 + C2 ) · t ||f || p + ||f || p ||g|| 0 + p0 ||g|| 0       .
                   t                          L (µ)      L (µ)   Lp (ν) t     Lp (ν)
                            1                       1
   Taking t = ||g|| p            0
                                       /||f || p0p          , we prove exactly
                           Lp (ν)                  L (µ)


                                     Tα (f, g) . (C1 + C2 ) · ||f ||                                    ||g||     0
                                                                                                                           .
                                                                                             Lp (µ)             Lp (ν)


                                                                            32
                                             CHAPTER 4

           Bellman functions on filtered probability spaces I:

                    Burkh¨
                         older’s hull and Super-solutions

   In this chapter, we find explicitly a super-solution of the dyadic Carleson Embed-
ding Theorem 1.9 via the Burkh¨older’s hull. We start with some properties of the
Bellman function B(F, f, M ; C), in particular, we prove the main inequality. Then, we
define and discuss the super-solutions in detail. In the last section, we introduce the
Burkh¨older’s hull and solve for a super-solution of Theorem 1.9 via the Burkh¨older’s
hull. This chapter proves the existence of the Bellman function B(F, f, M ; C).


                 1. Properties of the Bellman function B(F, f, M ; C)

   Proposition 4.1 (Properties of the Bellman function B(F, f, M ; C)).


         (i) Domain: fp ≤ F and 0 ≤ M ≤ C.
        (ii) Range: 0 ≤ B(F, f, M ; C) ≤ Cp · C · F .
        (iii) The main inequality: For all triples (F, f , M ) and (F± , f ± , M± ) belong to
            the domain f p ≤ F , 0 ≤ M ≤ C, with F = 12 (F+ + F− ), f = 12 (f + + f − ) and
            M = ∆M + 21 (M+ + M− ), where 0 ≤ ∆M ≤ M , we have

                                 1
(4.1)        B(F, f , M ; C) ≥     {B(F+ , f + , M+ ; C) + B(F− , f − , M− ; C)} + ∆M · f p .
                                 2

   Proof. (i) follows from the H¨older’s inequality and that {α }                     is a Carleson
                                                                            I   I∈D

sequence. (ii) holds if we assume Theorem 1.9 is true. We explain (iii) in more detail.
   Split the sum in the definition (1.15) of B(F, f, M ) into three pieces
            X            1        X            1        X
   |I|−1        α hf ip = |I+ |−1     α hf ip + |I− |−1     α hf ip + |I|−1 α hf ip ,
            J⊆I
                 J    J  2        J⊆I
                                       J    J  2        J⊆I
                                                             J    J          I    I
                                         +                         −


where I± means the right and left halves of I, respectively.
                                                 33
   Now, we choose f ± on the interval I± that almost give the supremum in the
definition (1.15) of B(F± , f± , M± ), i.e. for small ε > 0,
                                           X                                     ε
                             |I± |−1            α hf ± ip ≥ B(F± , f± , M± ; C) − ,
                                           J⊆I±
                                                 J      J                        2

and note that |I|−1 α hf ip = ∆M · f, we conclude
                              I        I
               X            1
       |I|−1       α hf ip ≥ {B(F+ , f+ , M+ ; C) + B(F− , f− , M− ; C)} − ε + ∆M · fp ,
               J⊆I
                    J    J  2

which yields exactly (4.1).                                                                                                      

   Remark 4.2. From (1.15), we know that the Bellman function B(F, f, M ; C)
exists and 0 ≤ B(F, f, M ; C) ≤ Cp · C · F if and only if Theorem 1.9 is true. The
sharpness is explained as
                                                                  B(F, f, M ; C)
(4.2)                                           sup                              = (p0 )p .
                                       fp ≤F, 0≤M ≤C                 C ·F
                                  2. Properties of the Super-solutions

   2.1. The super-solutions and the dyadic Caleson Embedding Theorem.

   Definition 4.3. A function satisfies Propositon 4.1 is called a super-solution. We
denote a super-solution by B(F, f, M ; C).

   We have seen that the dyadic Carleson Embedding Theorem 1.9 gives rise to
a super-solution B(F, f, M ; C). On the other hand, to prove (1.14) and actually
Theorem 1.9, it suffices to find any super-solution.
   Indeed, pick f ≥ 0 and {α }                                     satisfying the Carleson condition. For every
                                                    I       I∈D

dyadic interval I ∈ D, let F , f , M be the corresponding averages
                                            I   I           I
                                                                                       X
                                  F = hf p i , f = hf i , M = |I|−1                           α .
                                   I                I       I            I    I                J
                                                                                       J⊆I

Note that F = 12 (F                + F ), f = 12 (f                      + f ) and M = ∆M + 12 (M                      + M ),
                 I           I+            I−           I           I+       I−        I            I             I+        I−
                                    −1
where 0 ≤ ∆M = |I| α ≤ M . For the interval I, the main inequality (4.1)
                     I                      I               I

implies

 α hf ip ≤ |I|B(F , f , M ; C) − |I+ |B(F , f , M ; C) − |I− |B(F , f , M ; C).
   I      I              I    I        I                             I+      I+   I+                    I−   I−        I−
                                                                      34
    Going n levels down, we get the inequality
          X                                                   X
                       α hf ip ≤ |I|B(F , f , M ; C) −                     |J|B(F , f , M ; C)
                        J   J            I   I   I                               J   J       J
     J⊆I,|J|>2−n |I|                                     J⊆I,|J|=2−n |I|
                                                                                         Z
                                ≤ |I|B(F , f , M ; C) ≤ Cp · C · |I|F = Cp · C ·             f p.
                                         I   I   I                         I
                                                                                         I

    Applying the above estimate for the intervals [−2n , 0) and [0, 2n ) and taking the
limit as n → ∞, we prove exactly (1.14).

    Remark 4.4. To prove Theorem 1.9, all amounts to finding a super-solution
B(F, f, M ; C). We will see in section 3 that the least possible constant for which
B(F, f, M ; C) exists is Cp = (p0 )p .

    2.2. Further properties of B(F, f, M ; C). We start with the following cele-
brated theorem in convex analysis. We will give a proof for the sake of completeness,
for more details, see [29].

    Theorem 4.5. Let f : Ω → R be a locally bounded function defined on some
                                                                                         f (x)+f (y)
convex domain Ω ∈ Rn and f satisfies the midpoint concavity: f ( x+y
                                                                  2
                                                                     )≥                       2
                                                                                                       for
all x, y ∈ Ω. Then f is concave and locally Lipschitz.

    Proof. For concavity: If f is not concave, then there exist two points a, b ∈ Ω,
as well as the line segment connecting them [a, b] = {λa + (1 − λ)b : 0 ≤ λ ≤ 1} ⊆ Ω,
such that the function ϕ(λ) = f (λa + (1 − λ)b) − λf (a) − (1 − λ)f (b) verifies

                             −∞ < C = inf{ϕ(λ) : 0 ≤ λ ≤ 1} < 0.

Note that we have used Ω being convex and f being locally bounded here. Fur-
thermore, ϕ(0) = ϕ(1) = 0 and a direct computation shows that ϕ is also midpoint
concave. Take 0 < δ < − C2 and let 0 ≤ λ0 ≤ 1, such that ϕ(λ0 ) ≤ C + δ, without loss
of generality, further assuming 0 < λ0 < 12 , hence we have ϕ(0) = 0 and ϕ(2λ0 ) ≥ C,
however
                                        C   ϕ(0) + ϕ(2λ0 )
                  ϕ(λ0 ) ≤ C + δ <        =                , a contradiction!
                                        2         2
                                                 35
   For locally Lipschitz continuity: Given a ∈ Ω, we can find a ball B(a, 2r) ⊆ Ω on
which f is bounded by a constant M . For x 6= y in B(a, 2r), put z = y + ( αr )(y − x),
                                                                                  r           α
where α = ||y − x||. Clearly, z ∈ B(a, 2r). Moreover, since y =                  r+α
                                                                                     x   +   r+α
                                                                                                 z,   from
                                                    r               α
the concavity of f we infer that f (y) ≥           r+α
                                                       f (x)   +   r+α
                                                                       f (z).   So |f (y) − f (x)| ≤
 α                        ||y−x||
r+α
    |f (z)   − f (x)| ≤      r
                                    · 2M .                                                              

   In the case of our main inequality (4.1), first put F = 21 (F+ + F− ), f = 12 (f+ + f− )
and M = 12 (M+ + M− ) (i.e. ∆M = 0) and assume all triples (F, f, M ), (F± , f± , M± )
are in the convex domain: fp ≤ F, 0 ≤ M ≤ C, then we obtain the midpoint concavity
of B(F, f, M ; C). Apply Theorem 4.5 to the function B, so B is itself concave and
locally Lipschitz. In particular, B is a continuous function.
   Now let 0 ≤ λ ≤ 1 and F = λF+ (1 − λ)F− , f = λf+ (1 − λ)f− , M = ∆M + λM+ +
(1 − λ)M− . The main inequality (4.1) and concavity of B imply that

     ∆M · fp ≤ B(F, f, M ; C) − B(F, f, M − ∆M ; C)

                ≤ B(F, f, M ; C) − {λB(F+ , f+ , M+ ; C) + (1 − λ)B(F− , f− , M− ; C)} .

   Hence, the Bellman function B(F, f, M ) is continuous and satisfies

(4.3)    B(F, f, M ; C) ≥ λB(F+ , f+ , M+ ; C) + (1 − λ)B(F− , f− , M− ; C) + ∆M · fp .

   2.3. Regularization of the super-solutions. As we have seen, the Bellman
function B is concave and locally Lipschitz, and thus continuous, but hardly any better
than that. Fortunately, we know that the proof of Theorem 1.9 boils down to finding
just a super-solution B. We recall the trick of regularization of the super-solutions
from [25].
   Given a super-solution B(F, f, M ; C) satisfying Proposition 4.1.                     Let φε , ψε :
(0, ∞) → [0, ∞) be any two nonnegative C ∞ functions, such that supp(φε ) ⊆
                                                  R∞           R∞
[1, (1 + ε)p ], supp(ψε ) ⊆ [1 + ε, 1 + 2ε] and 0 φε (t) dtt = 0 ψε (t) dtt = 1. Define
                          ZZZ                        
                                          F f M                          dudvdw
       Bε (F, f, M ; C) =             B     , , ; C φε (u)ψε (v)φε (w)
                               (0,∞)3     u v w                             uvw
                          ZZZ                              
                                                       F        f        M dudvdw
                        =             B(u, v, w; C)φε       ψε    φε
                               (0,∞)3                   u       v          w     uvw
                                              36
Note that the second representation shows Bε ∈ C ∞ . Since B is continuous, the
family of smooth functions {Bε : ε > 0} converges to B pointwisely as ε → 0.
     To check Proposition 4.1 for Bε . Note that the supports of φε and ψε guarantee
that Bε is well-defined in the region {fp ≤ F, 0 ≤ M ≤ C} and an easy calculation
shows that 0 ≤ Bε ≤ Cp · C · F . For the main inequality, the first representation and
(4.1) imply that

                       1
     Bε (F, f, M ; C) −  {Bε (F+ , f+ , M+ ; C) + Bε (F− , f− , M− ; C)}
                       2
                                Z (1+ε)p Z 1+2ε Z (1+ε)p
                              p                             1                      dudvdw
                     ≥ ∆M · f                               p
                                                                φε (u)ψε (v)φε (w)
                                  1         1+ε   1        v w                      uvw
                                1
                     ≥                     ∆M · fp → ∆M · fp as ε → 0.
                       (1 + 2ε)p (1 + ε)p

     Hence, the proof of (1.14) given in subsection 2.1 works for the smooth function
Bε (F, f, M ; C) as well. In what follows, it suffices to consider only for smooth super-
solutions B(F, f, M ; C).


     2.4. The main inequality in its infinitesimal version. For a smooth super-
solution B(F, f, M ; C), being concave means the second differential d2 B ≤ 0. By the
main inequality (4.1), we have: B(F, f, M ) − B(F, f, M − ∆M ) ≥ ∆M · fp , and thus
∂B
∂M
     ≥ fp .
     Therefore, the main inequality (4.1) implies the following two infinitesimal ones

                                                     ∂B
(4.4)              d2 B(F, f, M ; C) ≤ 0     and        (F, f, M ; C) ≥ fp .
                                                     ∂M

     Actually, (4.4) is equivalent to the main inequality (4.1). Since by (4.4), we can
deduce


          ∆M · fp ≤ B(F, f, M ; C) − B(F, f, M − ∆M ; C)
                                        1
                   ≤ B(F, f, M ; C) −     {B(F+ , f+ , M+ ; C) + B(F− , f− , M− ; C)} .
                                        2
                                               37
              3. Finding a super-solution via the Burkh¨
                                                       older’s hull

   3.1. Burkh¨
             older’s hull and some reductions. Assume B(F, f , M ; C) is a
smooth super-solution. In this section, we present an explicit function B(F, f , M ; C)
with the help of the Burkh¨older’s hull.

   Definition 4.6. The Burkh¨older’s hull of B(F, f, M ; C) is defined by

(4.5)      u(f , M ; C) = sup{B(F, f, M ; C) − Cp · C · F }, f ≥ 0, 0 ≤ M ≤ C.
                             F

   Remark 4.7. This trick of eliminating one variable is due to D. Burkh¨older [27].

   It follows from the definition (1.15) of B(F, f, M ; C) that

(4.6)           B(F, f, M ; C) = C · B(F, f, M/C; 1).            Scaling Property

Thus, it suffices to consider only for C = 1. We adopt the notations B(F, f , M ) =
B(F, f , M ; C = 1), B(F, f , M ) = B(F, f , M ; C = 1) and u(f , M ) = u(f , M ; C = 1).

   Proposition 4.8. The Bukh¨older’s hull u(f, M ) satisfies the following properties
                          ∂u
                    (i)      (f, M ) ≥ fp    and        (ii) u(f, M ) is concave.
                          ∂M

   Proof. The proof follows from the definition (4.5).

        (i) From   ∂B
                   ∂M
                      (F, f, M )   ≥ fp , we conclude that B(F, f, M + ∆M ) − B(F, f, M ) ≥
           ∆M · fp . Choose F0 that almost gives the supremum in the definition of
           u(f, M ), i.e. for small ε > 0, B(F0 , f, M ) − Cp · F0 > u(f, M ) − ε, then

                u(f, M + ∆M ) − u(f, M )

              ≥ [B(F0 , f, M + ∆M ) − Cp · F0 ] − [B(F0 , f, M ) − Cp · F0 + ε]

              = [B(F0 , f, M + ∆M ) − B(F0 , f, M )] − ε

              ≥ ∆M · fp − ε.

               Letting ε → 0, so      ∂u
                                      ∂M
                                         (f, M )   ≥ fp .
        (ii) We need the following simple lemma.
                                                   38
                Lemma 4.9. Let ϕ(x, y) be a convex function and let Φ(x) = supy ϕ(x, y),
         then Φ(x) is also a concave function.

                Proof. We need to see Φ(λx1 + (1 − λ)x2 ) ≥ λΦ(x1 ) + (1 − λ)Φ(x2 ) for
         all x1 , x2 and 0 ≤ λ ≤ 1. Again choose y1 and y2 in the definition of Φ(x),
         such that for small ε > 0, ϕ(x1 , y1 ) > Φ(x1 ) − ε and ϕ(x2 , y2 ) > Φ(x2 ) − ε.
         Then

             λΦ(x1 ) + (1 − λ)Φ(x2 ) < λϕ(x1 , y1 ) + (1 − λ)ϕ(x2 , y2 ) + ε

                                       ≤ ϕ(λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 ) + ε

                                       ≤ Φ((λx1 + (1 − λ)x2 ) + ε,

         which proves the lemma.                                                        

                A direct application of this lemma gives (ii).

                                                                                        
   Remark 4.10. From Proposition 4.8 and (4.5), if the dyadic Carleson Embedding
Theorem 1.9 holds with constant Cp , then there exists a concave function u(f, M )
satisfying   ∂u
             ∂M
                (f, M )   ≥ fp and −Cp · fp ≤ u(f, M ) ≤ 0. On the other hand, if such a
u(f, M ) exists, then we can define B(F, f, M ) = u(f, M )+Cp ·F for F ≥ fp , 0 ≤ M ≤ 1,
and so B is a super-solution that proves the dyadic Carleson Embedding Theorem with
the same constant Cp . Hence, the best constant in the dyadic Carleson Embedding
Theorem is exactly the best constant for which the fuction u(f, M ) exists.

   Now, note the definition (1.15) of B(F, f , M ; C) implies that

(4.7)         B(tp F, tf , M ; C) = tp · B(F, f , M ; C) for all t ≥ 0.   Homogeniety

Hence, u(tf, M ) = tp · u(f, M ), which means u(f, M ) can be represented as u(f, M ) =
fp · ϕ(M ). For such a function u(f, M ), the Hessian equals
                                                            
                                     p−2          p−1 0
                           p(p − 1)f ϕ(M ) pf ϕ (M )
                                                            ,
                              pfp−1 ϕ0 (M )      fp ϕ00 (M )

                                                39
so the concavity of u(f, M ) is equivalent to the following two inequalities

                                           00
                  ϕ(M ) ≤ 0 and ϕϕ − (p0 )(ϕ0 )2 ≥ 0 for 0 ≤ M ≤ 1.

The inequality    ∂u
                  ∂M
                     (f, M )   ≥ fp means ϕ0 (M ) ≥ 1 and ϕ(M ) also satisfies −Cp ≤
ϕ(M ) ≤ 0.
Hence, our task is to find ϕ(M ), such that

      (i) 0 ≤ M ≤ 1
     (ii) −Cp ≤ ϕ(M ) ≤ 0
    (iii) ϕ0 (M ) ≥ 1
             00
    (iv) ϕϕ − (p0 )(ϕ0 )2 ≥ 0,

and the least possible constant is Cp = inf ϕ sup                         {−ϕ(M )}.
                                                                0≤M ≤1


   3.2. The formula of the Burkh¨
                                older’s hull and an explicit super-solution.
We first introduce φ(M ) = −ϕ(M ) ≥ 0, then φ(M ) satisfies

                                                                                       00
  (i) 0 ≤ M ≤ 1; (ii) 0 ≤ φ(M ) ≤ Cp ; (iii) φ0 (M ) ≤ −1; (iv) φφ − (p0 )(φ)2 ≥ 0,

and we need to consider Cp = inf sup         {φ(M )}.
                                       φ           0≤M ≤1
               00                               0 0
                                               p0 +1                        0 0
   Rewrite φφ − (p0 )(φ)2 ≥ 0 as φ       · φ /φp ≥ 0 or equivalently φ0 /φp ≥ 0.
                                                        0

          0 0                                  RM
Let φ0 /φp = g(M ) ≥ 0 and denote G(M ) = 0 g, we can solve
                                 "                     #p−1
                                           p−1
                        φ(M ) =                   RM        ,
                                   C2 M + C1 − 0 G
                                                            RM
where C1 and C2 are some constants, such that C2 M +C1 − 0 G ≥ 0 for 0 ≤ M ≤ 1.
                                                         h ip−1
   Note that φ0 (M ) ≤ −1, so sup         φ(M ) = φ(0) = p−1
                                                           C1
                                                                . All we need to do
                                  0≤M ≤1
                   h ip−1
now is to minimize p−1
                     C1
                             among all possible φ(M ).
   To this end, we compute
                                 "                                  #p
                                                p−1
                   φ0 (M ) = −                            RM             · [C2 − G(M )] ,
                                     C2 M + C1 −            0
                                                                G
and use again φ0 (M ) ≤ −1 with M = 1, which yields
                                Z 1
                                                              1
                     C1 ≤ −C2 +     G + (p − 1) · [C2 − G(1)] p .
                                           0
                                                     40
                                                                                 R1
Remember that G0 (M ) = g(M ) ≥ 0, thus G(M ) is increasing, in particular,        0
                                                                                       G≤
                                                            1
G(1), so C1 ≤ − [C2 − G(1)] + (p − 1) · [C2 − G(1)] . An easy calculation gives the
                                                            p

                                                                0                          0
maximum of the right hand side equals (p − 1) · (p0 )−p when C2 = G(1) + (p0 )−p ,
                                                   h ip−1
                                     0 −p0
therefore, C1 is at most (p − 1) · (p )    and thus p−1
                                                     C1
                                                          ≥ (p0 )p .
                                                                                               0
   To write down an explicit super-solution, simply take G(M ) = 0, C2 = (p0 )−p
                          0
and C1 = (p − 1) · (p0 )−p , then
                    "                 #p−1
                              p−1                        pp
          φ(M ) =                 RM       =                            ,
                      C2 M + C1 − 0 G        (p − 1) · [M + (p − 1)]p−1

and recall the relation B(F, f, M ) = u(f, M ) + Cp F = (p0 )p F − fp · φ(M ), we obtain
                                                   (pf)p
(4.8)                    u(f, M ) = −                              ,
                                        (p − 1) · [M + (p − 1)]p−1

                                                         (pf)p
(4.9)              B(F, f, M ) = (p0 )p F −                              .
                                              (p − 1) · [M + (p − 1)]p−1
In the general case, we have u(f , M ; C) = C · u(f , M/C) and B(F, f , M ; C) = C ·
B(F, f , M/C). Therefore, we have proved the following theorem.

   Theorem 4.11. The Burkh¨older’s hull of the dyadic Carleson Embedding Theorem
1.9 is given by
                                                   C · (pf )p
(4.10)                 u(f , M ; C) = −                               .
                                          (p − 1) · [ M
                                                      C
                                                        + (p − 1)]p−1
A super-solution that gives the sharpness Cp = (p0 )p is
                                                          C · (pf )p
(4.11)            B(F, f , M ; C) = (p0 )p F −                               .
                                                 (p − 1) · [ M
                                                             C
                                                               + (p − 1)]p−1

   Remark 4.12. Now that the dyadic Carleson Embedding Theorem 1.9 is proved,
the Bellman function B(F, f, M ; C) exists with Cp = (p0 )p . However, the super-
solution B(F, f , M ; C) obtained above is not the real Bellman function, since on the
boundary F = fp the real Bellman function must satisfy the boundary condition
B(F, f, M ; C) = M fp = M F , but the function we constructed does not satisfy this
condition. So, this super-solution only touches the real one along some set. For the
exact Bellman function B(F, f, M ; C), see [22] and [23].

                                              41
                                       CHAPTER 5

    The Bellman functions on filtered probability spaces II:

             Remodeling and proof of the main theorems

   In this chapter, we prove the two main results: Theorem 1.13 and Theorem 1.15.
The proof depends on a remodeling of the Bellman function B(F, f, M ; C = 1) for an
infinitely refining filtration.


              1. Properties of the Bellman function BµF (F, f, M, C)

   The Bellman function BµF (F, f, M ; C) associated to the martingale Carleson Em-
bedding Theorem 1.13 does not formally have the main inequality. But it still satisfies
the following properties.


   Proposition 5.1 (Properties of the Bellman function BµF (F, f, M ; C)).


      (i) Domain: fp ≤ F and 0 ≤ M ≤ C.
      (ii) Range: 0 ≤ BµF (F, f, M ; C) ≤ Cp · C · F .
     (iii) Homogeniety: BµF (tp F, tf, M ; C) = tp · BµF (F, f, M ; C) for all t ≥ 0.
     (iv) Scaling Property: BµF (F, f, M ; C) = C · BµF (F, f, M/C; 1).
      (v) BµF (F, f, M ; C) ≥ BµF (F, f, M − ∆M ; C) + ∆M · fp for 0 ≤ ∆M ≤ M .
          In particular, BµF (F, f, M ; C) is increasing in M .

   Proof. (i) follows from the H¨older’s inequality and that {αn }n≥0 is a Carleson
sequence. (ii) holds if we assume Theorem 1.12 is true. (iii) and (iv) are obtained
directly from definition (1.18). We explain (v) in more detail. Choose f ≥ 0 and
{αn }n≥0 that almost give the supremum in the definition (1.18), i.e. for small ε > 0,
                       "          #
                         X
                    Eµ      αn fnp ≥ BµF (F, f, M − ∆M ; C) − ε,
                            n≥0

                                              42
                                           P                          P           
where Eµ [f p ] = F , Eµ [f ] = f, Eµ          n≥0   αn = M − ∆M and Eµ    k≥n αk |Fn ≤ C

for every n ≥ 0. Since 0 ≤ M ≤ C, if we increase α0 to α0 + ∆M then everything is
                               P         
retained except we have now Eµ    n≥0 α n   = M and
                       "             #
                           X
                  Eµ             αn fnp ≥ BµF (F, f, M − ∆M ; C) − ε + ∆M · fp .
                           n≥0


Letting ε → 0, we obtain BµF (F, f, M ; C) ≥ BµF (F, f, M − ∆M ; C) + ∆M · fp .                      


 2. Remodeling of the Bellman function B(F, f, M ; C = 1) for an infinitely
                                            refining filtration

   In this section, we present a remodeling of the Bellman function B(F, f, M ; C = 1)
for an infinitely refining filtration, which is central to the proof of Theorem 1.13 and
Theorem 1.15. We use the notation B(F, f, M ) = B(F, f, M ; C = 1) in this and later
sections.
   Consider the unit interval I = [0, 1] ∈ D, let {Ijk : 1 ≤ j ≤ 2k } be its k-th
generation desendant by subdividing I into 2k congruent dyadic intervals and denote
I10 = I.
   Starting from the definition (1.15) of the Bellman function B(F, f, M ), we can
find a function f ≥ 0 with hf p i = F , hf i = f and a sequence {α } ,
                                   I             I                    J J⊆I
P
      α = M satisfying the Carleson condition with constant C = 1, such that
   J⊆I J
              α hf ip (almost) attains B(F, f, M ).
         P
the sum
            J⊆I    J        J
   To proceed, we further assume that the sequence {α }                          has only finitely many
                                                                       J   J⊆I

non-zero terms. Hence, the indices of {α }                     belong to the collection {Ijk : 1 ≤ k ≤
                                                     J   J⊆I

N, 1 ≤ j ≤ 2k } for some fixed integer N , i.e. for all J ∈
                                                          / {Ijk : 1 ≤ k ≤ N, 1 ≤ j ≤ 2k },
we have α       = 0. As a consequence, we can think the function f being piecewise
            J

constant on all {IjN : 1 ≤ j ≤ 2N }.
   Now, let us do the remodeling. Fix a small ε, 0 < ε < 1. Consider a discrete-time
filtered probability space (X , F, {Fn }n≥0 , µ). The initial construction is X10 = X ,
and this is Fn0 -measurable, where n0 = 0. Assume that the Fnk -measurable sets
Xjk , 1 ≤ j ≤ 2k are constructed. We want to inductively construct Fnk+1 -measurable
                                                     43
sets Xjk+1 , 1 ≤ j ≤ 2k+1 . Take a Fnk -measurable set Xjk . Our construction consists
of two steps.
    The first step is a modification of the set Xjk . For the given ε > 0 and Xjk ∈
Fnk , Definition 1.11 guarantees the existence of a real-valued Fnkj -measurable random
                                                                           2
variable h (nkj > nk ), such that: (i) |h1 | = 1 and (ii) X k |hnk |dµ ≤ ε4 µ(Xjk ). The
                                                            R
                                                  E         E               j

condition (ii) is chosen in such a way that

                                 n                   ε o ε
(5.1)                        µ     x ∈ Xjk : |hnk | >     ≤ µ(Xjk ).
                                                      2    2

    fk = X k \ x ∈ X k : |h | > ε/2	. So we can conclude |h | ≤ ε/2 on X
Let X                                                                   fk , and
      j     j         j      nk                             nk           j

                             j
                                fk ) ≤ µ(X k ).
moreover, (1 − ε/2) µ(X k ) ≤ µ(X         j            j
                                     k+1
                                      fk ∩{h = 1} and X k+1 = X fk ∩{h = −1}. Since
 In the second step, we set       X2j−1
                                   =X  j                2j       j
R                            fk ), which gives µ(X k+1 ) − µ(X k+1 ) ≤ ε µ(X
                                                 
 fk hdµ ≤ fk |hn |dµ ≤ ε µ(X
             R                                                                  fk ) ≤
 X           X    k        2  j                    2j−1         2j        2     j
   j              j
ε
2
  µ(Xjk ),   we have

                                          (                           )
                                                 k+1
                       1                      µ(X2j−1  ) µ(X2jk+1 )        1
(5.2)                    (1 − ε) ≤ max                  ,                 ≤ (1 + ε).
                       2                       µ(Xjk )    µ(Xjk )          2


    Do this for all Xjk , 1 ≤ j ≤ 2k and let nk+1 = max{nkj : 1 ≤ j ≤ 2k }. Hence, we
construct Fnk+1 -measurable sets Xjk+1 , 1 ≤ j ≤ 2k+1 . Our construction stops when
k = N.
    Now that we have constructed {Xjk : 0 ≤ k ≤ N, 1 ≤ j ≤ 2k }. We can define a
new sequence {αn }n≥0 on the space (X , F, µ) as
                                         
                                          µ(Xjk )−1 α , if x ∈ Xjk
                                         
                                                      k    Ij
                                 αnk =
                                          0, if x ∈ X \ S2k X k
                                         
                                                          j=1 j


and αn = 0 for all n’s different from nk , 1 ≤ k ≤ N .
   Finally, set the new function fe as fe1 N = f 1 N , 1 ≤ j ≤ 2N , and set fe = 0 on
                                          Xj      Ij
    S2N N
X \ j=1 Xj . Note that the function fe is also piecewise constant on all {Xjk : 0 ≤
k ≤ N, 1 ≤ j ≤ 2k }.
                                                      44
                                                     P         P
   Remark 5.2. This construction guarantees that Eµ      n≥0 αn =         α =M
      h i                                                             J⊆I J

and Eµ fe = hf i = f. Later in subsection 2.2 and subsection 3.2, we use a slightly
                       I

modified version of this construction.


   We will frequently consult to the following proposition.
                                                                            k+1       k+1
                                                                                                   
                                                  1                       µ(X2j−1 ) µ(X2j  )
   Proposition 5.3.                         (i)   2
                                                    (1   − ε) ≤ max        µ(Xjk )
                                                                                     ,   µ(Xjk )
                                                                                                       ≤ 21 (1 + ε).
        (ii) For every subset E ∈ Fnk and µ(E ∩ Xjk ) > 0, we have
                                 (                                             )
                                               k+1
                                        µ(E ∩ X2j−1  ) µ(E ∩ X2jk+1 )               1
(5.3)                  max                            ,                            ≤ (1 + ε).
                                         µ(E ∩ Xjk )    µ(E ∩ Xjk )                 2

            Combined with (i), we have
                       (                                           )
                                    k+1
                            µ(E ∩ X2j−1 ) µ(E ∩ X2jk+1 )                       1 + ε µ(E ∩ Xjk )
(5.4)            max              k+1
                                         ,                             ≤            ·            .
                              µ(X2j−1 )     µ(X2jk+1 )                         1−ε     µ(Xjk )

                             µ(Xjk )
        (iii) (1 − ε)k ≤       |Ijk |
                                         ≤ (1 + ε)k for all 0 ≤ k ≤ N, 1 ≤ j ≤ 2k .
        (iv) (1−ε)k hf i                ≤ hfei           ≤ (1+ε)k hf i             for all 0 ≤ k ≤ N, 1 ≤ j ≤ 2k .
                            IjN −k           XjN −k ,µ                 IjN −k


   Proof.                  (i) This is (5.2) from our construction.
        (ii) This is an important extension of (i). But we only have the upper bound es-

                                          Recall that our construction gives |hnk | ≤
            timation in this general case.
                    fk , so  R             R                   ε         fk
            ε/2 on X  j       E∩Xj
                                  fk hdµ  ≤
                                                 fk |hnk |dµ ≤ 2 µ(E ∩ Xj ), which is
                                                E∩Xj
                                          
            µ(E ∩ X k+1 ) − µ(E ∩ X k+1 ) ≤ ε µ(E ∩ Xfk ) ≤ ε µ(E ∩ X k ). So we obtain
                                                                       j
                     2j−1             2j      2         j     2

            (5.3). (5.4) follows from (5.3) and (i).
        (iii) We prove this by induction. For k = 0, we have µ(X10 ) = |I10 | = 1. Assuming
            (iii) holds for k, by (i) we can estimate for X2jk+1 (same for X2j−1
                                                                            k+1
                                                                                 ) that

                            k+1
                                         µ(X2jk+1 )
                                                  µ(X2jk+1 ) µ(Xjk )
                (1 − ε)              ≤   k+1
                                              =2·       k
                                                            ·   k
                                                                     ≤ (1 + ε)k+1 .
                                       |I2j |      µ(Xj )     |Ij |

        (iv) Again by induction, for k = 0, since fe1 N = f 1 N , 1 ≤ j ≤ 2N and fe = 0
                                                     Xj      Ij
                    S2N N
             on X \ j=1 Xj , we have hf i N = hf i N . Assuming (iv) holds for k, by
                                        e
                                                           Xj ,µ          Ij

                                                             45
          (i) we have

       (1 − ε)k+1 hf i   N −(k+1)
                                    ≤ hfei    N −(k+1)
                       Ij                    Xj          ,µ
                                               N −k
                                            µ(X2j−1 )                              µ(X2jN −k )
                                    =         N −(k+1)
                                                              hfei  N −k
                                                                              +       N −(k+1)
                                                                                                     hfeiN −k
                                        µ(Xj              )        X2j−1 ,µ       µ(Xj           )      X2j   ,µ


                                    ≤ (1 + ε)k+1 hf i          N −(k+1)
                                                                          .
                                                              Ij

                                                                                                                   
           3. The Bellman function BµF (F, f, M ; C) of Theorem 1.13

    3.1. BµF (F, f , M ) ≤ B(F, f , M ). We show (1.19) for the case C = 1 and the
general case follows from the scaling property. Take the Bellman function B(F, f, M )
of the dyadic Carleson Embedding Theorem. Consider an arbitrary function f ≥ 0
and an arbitrary Carleson sequence {αn }n≥0 with C = 1. Set for every n ≥ 0,
                                                                     "          #!
                                                                       X
          X n = (F n , fn , M n ) = Eµ [f p |Fn ] , Eµ [f |Fn ] , Eµ     αk |Fn    .
                                                                                         k≥n

Fix the initial step
                                                              "           #!
                                                               X
                         0          µ   p      µ         µ
                    X =          E [f ], E [f ], E                   αn        = (F, f, M ).
                                                               n≥0

By (1.16), we have 0 ≤ M n ≤ 1. Also, fn = fn and when n ≥ 1, F n , fn and M n are
random variables.

    Lemma 5.4. For every n ≥ 0, we have

                             Eµ [B(X n )] − Eµ B(X n+1 ) ≥ Eµ [αn fnp ] ,
                                                       

where B(X n ) = B(F n , fn , M n ).

    Proof. Recall that the Bellman function B(F, f, M ) satisfies (4.3). Note also we
have

                                 X n = Eµ X n+1 |Fn + (0, 0, αn ).
                                                  

                                                         46
By (4.3) and the Jensen’s inequality, we deduce

              B(X n ) ≥ B Eµ X n+1 |Fn + αn fnp ≥ Eµ B(X n+1 )|Fn + αn fnp .
                                                             

Taking expectation, we prove exactly

                          Eµ [B(X n )] − Eµ B(X n+1 ) ≥ Eµ [αn fnp ] .
                                                    

   Summing up, we get the inequality
             "          #
               X           X
          Eµ      αn fnp ≤      Eµ [B(X n )] − Eµ [B(X n+1 )] ≤ B(X 0 ).
                                                             
                    n≥0               n≥0

Hence, we conclude that BµF (F, f , M ) ≤ B(F, f , M ).

   3.2. BµF (F, f , M ) = B(F, f , M ) for an infinitely refining filtration. To show
(1.20), again we consider C = 1. Note first that on the boundary fp = F , we
have BµF (F, f , M ) = B(F, f , M ) = M F . For the case fp < F , we need to apply the
remodeling from section 1.
   For technical issues, we slightly modify our remodeling here. First, by the conti-
nuity of B, there exists δ1 > 0, such that fp < F − δ1 and B(F − δ1 , f , M ) is close to
B(F, f , M ). Next, by the definition of B, we can find a non-negative function f on
the unit interval I = [0, 1] with hf p i = F − δ1 , hf i = f and a sequence {α } ,
                                        I               I                       J J⊆I
P
      α = M satisfying the Carleson condition with constant C = 1, such that the
  J⊆I J
           α hf ip (almost) equals B(F, f, M ). Moreover, by again the continuity, we
     P
sum
       J⊆I J      J
                                                    P
can choose a finite subset of {α }        such that       α = M − δ2 for some δ2 > 0
                                 J J⊆I                J⊆I J
           α hf ip still (almost) equals B(F, f, M ). For simplicity, we assume exactly
    P
and
        J⊆I    J    J
                                  X
(5.5)                                       α hf ip = B(F, f, M ).
                                             J    J
                                  J⊆I

   Let the indices of {α }            belong to the collection {Ijk : 1 ≤ k ≤ N, 1 ≤ j ≤ 2k }
                            J   J⊆I

for some fixed integer N . Choose ε > 0, such that F − δ1 ≤ F/(1 + ε)N . We do the
remodeling with this ε > 0 to construct {Xjk : 0 ≤ k ≤ N, 1 ≤ j ≤ 2k }, {αn }n≥0 and
fe on the space (X , F, µ). To proceed, we observe that
                                                      47
   Lemma 5.5.
                                         h i
(5.6)                                  Eµ fep ≤ (1 + ε)N hf p i .
                                                                              I

   Proof. By (iii) of Proposition 5.3,
              2N
          h i X                                   2
                                                  X
                                                   N

         µ  p
        E f =
           e     hfep i               µ(XjN )   ≤   hf p i               · (1 + ε)N |IjN | = (1 + ε)N hf p i .
                            XjN ,µ                                 IjN                                     I
                 j=1                              j=1

                                                                                                          
                                                                                      h i
   So (5.6) and hf p i = F − δ1 ≤ F/(1 + ε)N                             imply that Eµ fep ≤ F . Also recall
                      I         h i
from the remodeling, we know Eµ fe = hf i = f. Let us further modify the function
                                                               I

fe in the following way. Note that we are working on an infinitely refining filtration
(see definition 1.11). There exists a simple function g behaving like a Haar function,
such that g is supported on X1N , hgi                      = 0 and 0 < Eµ [|g|p ] < ∞. Consider the
                                                 X1N ,µ
continuous function
                                                  h        p i
                                         a(t) = Eµ fe + tg  .
                                                           

Thus, a(0) ≤ F and limt→∞ a(t) = ∞. Hence, we can find t0 ≥ 0, such that
  h          p i                                             h p i
Eµ fe + t0 g  = F . Update fe to fe + t0 g. We have then Eµ fe = F and
                                                               
  h i
Eµ fe = f. Note here the updated function fe might be negative, however, all the
relevant average values we will use are still non-negative.
   Now, let us discuss the properties of the Carleson sequence {αn }n≥0 . Directly
                                 P          
from the remodeling, we know Eµ
                                                 P
                                     n≥0 α n   =     α = M − δ2 . Moreover, we
                                                                            J⊆I    J

can prove

   Lemma 5.6. The non-negative sequence {αn }n≥0 satisfies each αn is Fn -measurable
and
                            "               #
                                X                  (1 + ε)N
(5.7)                  Eµ             αk |Fn ≤               for every n ≥ 0.
                                k≥n
                                                   (1 − ε)2N

   Proof. From the construction, it is clear that each αn is non-negative and Fn -
measurable. So we need to show for every Fn -measurable set E, we have
                          "            #
                            X              (1 + ε)N
                       Eµ       αk 1     ≤           · µ(E).
                            k≥n
                                     E     (1 − ε)2N

                                                          48
                                                                                        hP                      i       h hP              i i
Denote by k0 = min{k : nk ≥ n}. Since Eµ                                                         k≥n αk 1           = Eµ Eµ      α
                                                                                                                             k≥k0 nk |Fnk0 1 ,
                                                                                                            E                                   E

it suffices to show
                                                         "                                  #
                                                              X                                      (1 + ε)N
                                                 Eµ                   αnk |Fnk0 ≤                              ,
                                                             k≥k0
                                                                                                     (1 − ε)2N

or equivalently, for every Fnk0 -measurable set E, we have
                                                     "                            #
                                                 µ
                                                         X                              (1 + ε)N
                                             E                    αnk 1               ≤           · µ(E).
                                                         k≥k0
                                                                              E         (1 − ε)2N

Now the explicit computation shows
                                             "                            #            2         k

                                         µ
                                                 X                                    XX                µ(E ∩ Xjk )
                                     E                    αnk 1               =                     αk              .
                                                 k≥k0
                                                                      E
                                                                                      k≥k0      j=1
                                                                                                     Ij   µ(Xjk )

An iteration of (5.4) gives

                    µ(E ∩ Xjk )   (1 + ε)N µ(E ∩ Xlk0 )
                                ≤          ·            , whenever Xjk ⊆ Xlk0 .
                      µ(Xjk )     (1 − ε)N       k0
                                             µ(Xl )

So we can estimate
     "                      #        k
                                    2 0
 µ
         X                          X   (1 + ε)N                      µ(E ∩ Xlk0 )                   X
E               αnk 1           ≤                                 ·                                              α k , {α } Carleson sequence
         k≥k0
                        E
                                    l=1
                                             (1 − ε)N                   µ(Xlk0 )                           k
                                                                                                                    Ij       I
                                                                                                 k,j:Xjk ⊆Xl 0

                                           2 0            k
                                  (1 + ε)N X µ(E ∩ Xlk0 )  k0 
                                ≤                        · Il , Proposition 5.3 (iii)
                                  (1 − ε)N l=1 µ(Xlk0 )
                                            2 0               k
                                  (1 + ε)N X             k0     (1 + ε)N
                                ≤               µ(E ∩ X l   ) ≤           · µ(E).
                                  (1 − ε)2N l=1                 (1 − ε)2N

                                                                                                                                          
     To finish, we need one final lemma.


     Lemma 5.7.
                                             "                    #
                                                 X            p              X
(5.8)                                Eµ                   αn fn  ≥ (1 − ε)pN   α hf ip .
                                                             e
                                                                                  J                                      J
                                                 n≥0                                                    J⊆I

                                                                                       49
     Proof.

                                               2k
     "                 #     "             #
         X         p         X      p     XX      D p E
Eµ             αn fn  = Eµ
                                 αnk fnk  =     αIjk fenk 
                  e                e                
         n≥0                   k≥0                              k≥0 j=1                        Xjk ,µ

                            2   k                                2      k
                           XX                                   XX
                       ≥             α k hfenk ip          =                 α k hfeip        , Proposition 5.3 (iv)
                                      Ij          Xjk ,µ                      Ij     Xjk ,µ
                           k≥0 j=1                               k≥0 j=1

                                       XX    2k                                          X
                       ≥ (1 − ε)pN                  αIjk hf ip = (1 − ε)pN                     α hf ip .
                                                                 Ijk                              J     J
                                       k≥0 j=1                                           J⊆I


    Summarizing, we have constructed a function fe and a Carleson sequence {αn }n≥0
                        h p i         h i              P         
satisfying (5.7) with E f  = F , Eµ fe = f and Eµ
                       µ  e
                                                                        P
                                                            n≥0 α n   =      α =
                                                                                                            J⊆I   J

M − δ2 . By (5.5) and (5.8), we deduce

                              (1 + ε)N
                                                                     X
 BµF       F, f, M − δ2 ; C =                  ≥ (1 − ε)pN                   α hf ip = (1 − ε)pN B (F, f, M ) .
                              (1 − ε)2N                                J⊆I
                                                                               J    J


     And Proposition 5.1 (iv) and (v) imply that

                               (1 + ε)N      (1 + ε)N F          (1 − ε)2N
                                                                                  
     BµF    F, f, M − δ2 ; C =             =          B    F, f,           (M − δ2 )
                               (1 − ε)2N     (1 − ε)2N µ         (1 + ε)N
                                                           (1 + ε)N F
                                                    ≤               B (F, f, M ) .
                                                           (1 − ε)2N µ

     Letting ε → 0, we prove exactly BµF (F, f , M ) ≥ B(F, f , M ). The other inequality
is proved in the subsection 3.1.


               4. The Bellman function BeµF (F, f) of the maximal operators

     4.1. BeµF (F, f ) ≤ BµF (F, f , 1). Let us relate the maximal function (1.21) to the
Bellman function BµF (F, f , M ). Define a sequence of sets


 En = {x ∈ X : n is the smallest non-negative integer, such that f ∗ (x) = |fn (x)|}.
                                                           50
Obviously, {En }n≥0 forms a disjoint partition of X . We can compute
                                                      "            #
                                                        X
                   ||f ∗ ||p p   = Eµ [|f ∗ |p ] = Eµ     |fn |p 1
                              L (X ,F ,µ)                                                       En
                                                                                n≥0
                                                         "                                      #
                                                          X
                                            = Eµ                  Eµ [1        |Fn ] · |fn |p .
                                                                          En
                                                          n≥0


Let αn = Eµ [1         |Fn ], n ≥ 0. The connection between the maximal function (1.21)
                  En

and BµF (F, f , M ) relies on the following simple fact.


   Lemma 5.8. {αn }n≥0 is a Carleson sequence with C = 1. (see Definition 1.10).


    Proof. It is clear that each αn is non-negative and Fn -measurable. Moreover,
                                  hP          i      hP          i
for every set E ∈ Fn , we have Eµ    k≥n αk 1   = Eµ
                                                        k≥n 1      ≤ µ(E). So we
                                                                   E                                Ek ∩E

prove the claim.                                                                                                         

   To prove (1.24), fix Eµ [f p ] = F and Eµ [f ] = f. Since {αn }n≥0 is a Carleson
                                                               eF            F
                            P          
sequence with C = 1 and Eµ        n≥0 αn = 1, we conclude that Bµ (F, f ) ≤ Bµ (F, f , 1).


   4.2. BeµF (F, f ) = BµF (F, f , 1) for an infinitely refining filtration. Again, we
appeal to the modified remodeling from subsection 2.2, but only with M = 1. Note
that we have
                                              2k
            "         #      "            #
             X  p         X      p     XX      D p E
          µ                µ
              αn fn  = E      αnk fnk  =     αIjk fenk                                                         .
                                    e                
        E         e
              n≥0                           k≥0                                k≥0 j=1                      Xjk ,µ


To proceed, we observe that


   Lemma 5.9. For every 0 ≤ k ≤ N and 1 ≤ j ≤ 2k , we have
                                                           (1 + ε)k e
(5.9)                         fn          1            ≤            hf i N −k .
                              e           
                                    N −k        XjN −k       (1 − ε)k     Xj   ,µ

                                                  
   Proof. First note that fen                     1              = fen           1            for every 0 ≤ k ≤ N
                                                  
                                           N −k          XjN −k            N −k        XjN −k

and 1 ≤ j ≤ 2k . Induction on k, for k = 0, the construction of fe immediately
gives fen 1       = hfei           , 1 ≤ j ≤ 2N . Assuming (5.9) holds for k, then for every
        N   XjN           XjN ,µ

                                                             51
                                                                  N −(k+1)
Fn              -measurable set E, E ⊆ Xj                                          and µ(E) > 0, we can estimate
     N −(k+1)
Z                                                 Z                                   Z                              Z
     fen              1      N −(k+1)
                                        dµ =                            fedµ =                    fen         dµ +                 fen      dµ
 E         N −(k+1)      Xj                           N −(k+1)
                                                   E∩Xj                                    N −k
                                                                                        E∩X2j−1        N −k                 N −k
                                                                                                                         E∩X2j       N −k

                                              (1 + ε)k e
                                                                                                        
                                                                          N −k                      N −k
                                            ≤            hf i N −k µ(E ∩ X2j−1 ) + hf i N −k µ(E ∩ X2j ) .
                                                                                    e
                                              (1 − ε)k       X2j−1 ,µ                  X2j ,µ

And hence, we deduce
               Z
            −1                                                                              N −(k+1)
       µ(E)      fen                              1    N −(k+1)
                                                                      dµ, (E ⊆ Xj                      )
                                   E   N −(k+1)       Xj
                  "                                                                #
                                          N −k
         (1 + ε)k e               µ(E ∩ X2j−1  )                 µ(E ∩ X2jN −k )
       ≤            hf i N −k            N −k+1
                                                   + hfei N −k          N −k+1
                                                                                     , (5.3)
         (1 − ε)k       X2j−1 ,µ µ(E ∩ Xj        )       X2j ,µ µ(E ∩ Xj         )
         1 (1 + ε)k+1 e
                                                  
       ≤                 hf i N −k + hfei N −k , Proposition 5.3 (i)
         2 (1 − ε)k          X2j−1 ,µ     X2j ,µ

             (1 + ε)k+1 e
       ≤                hf i N −(k+1) .
             (1 − ε)k+1     Xj       ,µ

                                                                                                              N −(k+1)
Since this is true for every Fn                                   -measurable set E, E ⊆ Xj                                and µ(E) > 0,
                                                      N −(k+1)
we prove (5.9) for k + 1.                                                                                                                   

     Applying (5.9), we have
                             2k                                                                                      k
        "              #                                                                                   2
      µ
          X  p        XX        D p E                                                   (1 + ε)pN X X
             αn  f n  =       αIjk fenk                                                 ≤                 αIjk hfeip .
                                      
     E            e
                                                                                                     pN
               n≥0                         k≥0 j=1                                 Xjk ,µ
                                                                                              (1 − ε) k≥0 j=1          Xjk ,µ


And note that Proposition 5.3 (iii) implies
             X                                        1
                                   ≤ Ijk00  ≤            µ(Xjk00 ) for every 0 ≤ k0 ≤ N, 1 ≤ j0 ≤ 2k0 .
                                      
                         α
                  k
                             Ijk                  (1 − ε)N
       k,j:Ijk ⊆Ij 0
                     0


Now, let us recall a useful lemma established in [22], formulated in our language,

     Lemma 5.10. Suppose α                              ≥ 0, where 0 ≤ k ≤ N, 1 ≤ j ≤ 2k , satisfies
                                                  Ijk

                                                           X
(5.10)                                                                  α         ≤ Cµ(Xjk00 )
                                                                            Ijk
                                                                  k
                                                        k,j:Ijk ⊆Ij 0
                                                                   0


for some constant C > 0, then we can choose pairwise disjoint measurable Akj ⊆ X
such that Akj ⊆ Xjk and α                         = Cµ(Akj ).
                                            Ijk

                                                                             52
   Proof. Without loss of generality, we can assume C = 1. We start at the level
k = N . Since (5.10) with C = 1 implies α                             ≤ µ(XjN ) for every 1 ≤ j ≤ 2N , we can
                                                             IjN
choose AN     N
        j ⊆ Xj such that α                           = µ(AN
                                                          j ). Assuming that we have chosen pairwise
                                               IjN
disjoint measurable Akj for all k ≥ k0 + 1 and 1 ≤ j ≤ 2k , note that (5.10) with C = 1
gives                                                                                                                  
                          X                                                                              [
          α        +                   α      ≤ µ(Xjk00 ), so α             ≤ µ Xjk00 \                             Akj  ,
                                                                                                                        
             k                          Ijk                           k
            Ij 0                 k
                                                                     Ij 0                                        k
              0                                                        0
                       k,j:Ijk $Ij 0                                                              k,j:Ijk $Ij 0
                                  0                                                                              0

                                                          Akj00          Xjk00                           Akj ,
                                                                                     S
and thus we can choose measurable set                                ⊆           \                 k
                                                                                         k,j:Ijk $Ij 0
                                                                                                                 such that α    k
                                                                                                                                      =
                                                                                                    0                          Ij 0
                                                                                                                                 0
µ(Akj00 ). Continue this process for all the indices. This proves the lemma.                                                          

   By Lemma 5.10, we can estimate
                                           2k
         "              #
                                  pN X X
       µ
           X  p        (1 + ε)
      E        αn fen  ≤         pN
                                              αIjk hfeip
           n≥0
                           (1 − ε)    k≥0 j=1
                                                       Xjk ,µ

                                                      2          k
                                         (1 + ε)pN X X       1
                                       =                          µ(Akj )hfenk ip
                                         (1 − ε) k≥0 j=1 (1 − ε)N
                                                pN
                                                                                Xjk ,µ

                                                            2k
                                           (1 + ε)pN X X µ  e∗ p
                                                                         
                                       ≤                       E f  1 k , disjointness
                                         (1 − ε)(p+1)N k≥0 j=1         Aj


                                           (1 + ε)pN     h p i
                                                        µ  e∗ 
                                       ≤               E f  .
                                         (1 − ε)(p+1)N
   Applying (5.5) and (5.8) with M = 1, together with Theorem 1.13, we have
                                    "                 #
        (1 + ε)pN      h p i
                     µ  e∗      µ
                                      X        p
                                                                  pN
                                                                     X
                          f    ≥          α     f       ≥ (1 − ε)        α hf ip
                                              e
             (p+1)N
                    E            E          n     n
      (1 − ε)
                                                                      J    J
                                      n≥0                            J⊆I

                           = (1 − ε)pN B (F, f, 1) = (1 − ε)pN BµF (F, f, 1) .
                 h p i           h i
   Recall that Eµ fe = F and Eµ fe = f. Letting ε → 0, we prove exactly
                   

BeµF (F, f ) ≥ BµF (F, f , 1). The other inequality is proved in the subsection 4.1.


                                                            53
                                        Bibliography

 [1] B. Muckenhoupt, Hardy’s inequality with weights, Studia Math. 44 (1972), 3138. Collection of
    articles honoring the completion by Antoni Zygmund of 50 years of scientific activity, I.
 [2] E. Sawyer, A characterization of a two-weight norm inequality for maximal operators, Studia
    Math. 75 (1982), no. 1, 1-11
 [3] E. Sawyer, A characterization of two weight norm inequalities for fractional and Poisson inte-
    grals, Trans. Amer. Math. Soc. 308 (1988), no. 2, 533-545.
 [4] E. Sawyer and R. Wheeden, Weighted inequalities for fractional integrals on Euclidean and
    homogeneous spaces, Amer. J. Math. 114 (1992), no. 4, 813-874
 [5] F. Nazarov, S. Treil and A. Volberg, The Bellman function and two weight inequalities for Haar
    multipliers, J. Amer. Math. Soc. 12 (1999), no. 4, 909-928
 [6] M. Lacey, E. Sawyer and I. Uriarte-Tuero, Two weight inequalities for discrete positive operators,
    ArXiv:0911.3437 [math.CA] (2009)
 [7] S. Treil, A remark on two weight estimates for positive dyadic operators, ArXiv:1201.1145
    [math.CA] (2012)
 [8] Tuomas P. Hyt¨
                  onen, The A2 theorem: remarks and complements, ArXiv:1212.3840 [math.CA]
    (2012)
 [9] J. Scurry, A characterization of two-weight inequalities for a vector-valued operator,
    ArXiv:1007.3089 [math.CA] (2010)
[10] Timo S. H¨
              anninen, Another proof of Scurry’s characterization of a two weight norm inequality
    for a sequence-valued positive dyadic operator, ArXiv:1304.7759 [math.CA] (2013)
[11] F. Nazarov, S. Treil and A. Volberg, Two weight inequalities for individual Haar multipliers
    and other well localized operators, Math. Res. Lett. 15 (2008), no. 3, 583-597
[12] F. Nazarov, S. Treil and A. Volberg, Two weight estimate for the Hilbert transform and Corona
    decomposition for non-doubling measures, ArXiv:1003.1596[math.CA] (2010).
[13] M. Lacey, E. Sawyer, C.Y. Shen and I. Uriarte-Tuero, Two Weight Inequality for the Hilbert
    Trans-form: A Real Variable Characterization I, Duke Math. J. 163 (2014), no. 15, 2795-2820
[14] M. Lacey, Two Weight Inequality for the Hilbert Transform: A Real Variable Characterization
    II, Duke Math. J. 163 (2014), no. 15, 2821-2840

                                                  54
[15] M. Lacey, E. Sawyer, C.Y. Shen, I. Uriarte-Tuero and B. Wick, Two Weight Inequalities for
    the Cauchy Transform from R to C+ , ArXiv:1310.4820 [math.CV] (2013)
[16] M. Lacey and B. Wick, Two Weight Inequalities for Riesz Transforms: Uniformly Full Dimen-
    sion Weights, ArXiv:1312.6163 [math.CA] (2013)
[17] S. Treil, Commutators, Paraproducs and BMO in non-homogeneous martingale settings, Rev.
    Mat. Iberoam. 29 (2013), no. 4, 1325-1372
[18] J. Lai, A new two weight estimates for a vector-valued positive operator, ArXiv:1503.06778v2
    [math.CA] (2015)
[19] J. Lai and S. Treil, Two weight estimates for paraproducts in non-homogeneous settings, Preprint
[20] F. Nazarov, S. Treil, The hunt for a Bellman function: applications to estimates of singular
    integral operators and to other classical problems in harmonic analysis, Algebra i analiz, 8
    (1996), no. 5, 32-162 (in Russian); English transl. in St. Petersburg Math. J. 8 (1997), no. 5,
    721-824.
[21] F. Nazarov, S. Treil, A. Volberg, Bellman function in stochastic control and harmonic analysis,
    Systems, approximation, singular integral operators, and related topics (Bordeaux, 2000), 393-
    423, Oper. Theory Adv. Appl., 129, Birkh¨auser, Basel, 2001.
[22] A. Melas, The Bellman functions of dyadic-like maximal operators and related inequalities, Adv.
    in Math., 192 (2005), no. 2, 310-340.
[23] V. Vasyunin, A. Volberg, Monge-Amp`
                                       ere equation and Bellman optimization of Carleson Em-
    bedding Theorems, Amer. Math. Soc. Transl. (2) 226 (2009), 195-238
[24] J. Lai, The Bellman functions of the Carleson Embedding Theorem and the Doob’s martingale
    inequality, ArXiv:1411.5408v3 [math.CA] (2014)
[25] F. Nazarov, A counterexample to Sarason’s conjecture, Preprint, Michigan State Univ., 1997,
    pp. 1-17.
[26] D. Burkh¨
             older, Martingales and Fourier analysis in Banach spaces, C.I.M.E. Lectures Varenna
    Como, Italy, 1985, Lecture Notes Mathematics, 1206(1986), 61-108.
[27] D. Burkh¨
             older, Explorations in martingale theory and its applications, Ecole d’Et´e de Prob-
    abilit´e de Saint-Flour XIX-1989, 1-66, Lecture Notes in Mathematics, 1464(1991), Springer,
    Berlin.
[28] G. Wang, Sharp maximal inequalities for conditionally symmetric martingales and Brownian
    motion, Proc. Amer. Math. Soc. 112 (1991) 579-586.
[29] C. Niculescu, L. Persson, Convex functions and their applications: A contemporary approach,
    CMS Books in Mathematics, 1st edition, Springer, 2005.
[30] J. L. Doob, Stochastic Processes, John Wiley & Sons. ISBN 0-471-52369-0, 1953.

                                                 55